$ ( click to jump on task ) . ├── Introduction │ └── Jupyter hack!! │ ├── Working with Beautiful Soup │ └── Searching with Beautiful Soup │ ├── Task1: Football Table │ ├── Task2: Digikala Laptop Search │ └── cleaning the table │ ├── Working with Selenium │ └── Selenium Webdriver basics │ ├─ Another example │ └─ Waits │ └── Task3: Extracting ticket informations ├── Crawling Mrbilit └── Cleaning & Joining the Tables
In this Hands-On excercise, you will become familiar with these concepts:
Some of the Web Scraping examples are inspired by this awesome free Persian course by Mr. Hossein Khorang. Please check it out for more insight.
Run the code below. Now by clicking TAB when writing code, you get a list of all functions and objects and you can enjoy auto completion. I recommend going wild with this feature and using it always! You can also use SHIFT + TAB in front of any function or variable to see its information.
%config Completer.use_jedi = False
We can send a GET request to any webpage and get frontend's source code. Raw source code is usually messy and difficult to parse...
import requests
url = 'https://python.org'
response = requests.get(url)
print(response.encoding)
print(response.apparent_encoding)
print(response.text)
utf-8
Windows-1252
<!doctype html>
<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<link rel="prefetch" href="//ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">
<link rel="prefetch" href="//ajax.googleapis.com/ajax/libs/jqueryui/1.12.1/jquery-ui.min.js">
<meta name="application-name" content="Python.org">
<meta name="msapplication-tooltip" content="The official home of the Python Programming Language">
<meta name="apple-mobile-web-app-title" content="Python.org">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="HandheldFriendly" content="True">
<meta name="format-detection" content="telephone=no">
<meta http-equiv="cleartype" content="on">
<meta http-equiv="imagetoolbar" content="false">
<script src="/static/js/libs/modernizr.js"></script>
<link href="/static/stylesheets/style.7c9ba80a645a.css" rel="stylesheet" type="text/css" media="all" title="default" />
<link href="/static/stylesheets/mq.f9187444a4a1.css" rel="stylesheet" type="text/css" media="not print, braille, embossed, speech, tty" />
<!--[if (lte IE 8)&(!IEMobile)]>
<link href="/static/stylesheets/no-mq.bf0c425cdb73.css" rel="stylesheet" type="text/css" media="screen" />
<![endif]-->
<link rel="stylesheet" href="//ajax.googleapis.com/ajax/libs/jqueryui/1.12.1/themes/smoothness/jquery-ui.css">
<link rel="icon" type="image/x-icon" href="/static/favicon.ico">
<link rel="apple-touch-icon-precomposed" sizes="144x144" href="/static/apple-touch-icon-144x144-precomposed.png">
<link rel="apple-touch-icon-precomposed" sizes="114x114" href="/static/apple-touch-icon-114x114-precomposed.png">
<link rel="apple-touch-icon-precomposed" sizes="72x72" href="/static/apple-touch-icon-72x72-precomposed.png">
<link rel="apple-touch-icon-precomposed" href="/static/apple-touch-icon-precomposed.png">
<link rel="apple-touch-icon" href="/static/apple-touch-icon-precomposed.png">
<meta name="msapplication-TileImage" content="/static/metro-icon-144x144-precomposed.png"><!-- white shape -->
<meta name="msapplication-TileColor" content="#3673a5"><!-- python blue -->
<meta name="msapplication-navbutton-color" content="#3673a5">
<title>Welcome to Python.org</title>
<meta name="description" content="The official home of the Python Programming Language">
<meta name="keywords" content="Python programming language object oriented web free open source software license documentation download community">
<meta property="og:type" content="website">
<meta property="og:site_name" content="Python.org">
<meta property="og:title" content="Welcome to Python.org">
<meta property="og:description" content="The official home of the Python Programming Language">
<meta property="og:image" content="https://www.python.org/static/opengraph-icon-200x200.png">
<meta property="og:image:secure_url" content="https://www.python.org/static/opengraph-icon-200x200.png">
<meta property="og:url" content="https://www.python.org/">
<link rel="author" href="/static/humans.txt">
<link rel="alternate" type="application/rss+xml" title="Python Enhancement Proposals"
href="https://www.python.org/dev/peps/peps.rss/">
<link rel="alternate" type="application/rss+xml" title="Python Job Opportunities"
href="https://www.python.org/jobs/feed/rss/">
<link rel="alternate" type="application/rss+xml" title="Python Software Foundation News"
href="https://feeds.feedburner.com/PythonSoftwareFoundationNews">
<link rel="alternate" type="application/rss+xml" title="Python Insider"
href="https://feeds.feedburner.com/PythonInsider">
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "WebSite",
"url": "https://www.python.org/",
"potentialAction": {
"@type": "SearchAction",
"target": "https://www.python.org/search/?q={search_term_string}",
"query-input": "required name=search_term_string"
}
}
</script>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-39055973-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body class="python home" id="homepage">
<div id="touchnav-wrapper">
<div id="nojs" class="do-not-print">
<p><strong>Notice:</strong> While JavaScript is not essential for this website, your interaction with the content will be limited. Please turn JavaScript on for the full experience. </p>
</div>
<!--[if lte IE 8]>
<div id="oldie-warning" class="do-not-print">
<p>
<strong>Notice:</strong> Your browser is <em>ancient</em>. Please
<a href="http://browsehappy.com/">upgrade to a different browser</a> to experience a better web.
</p>
</div>
<![endif]-->
<!-- Sister Site Links -->
<div id="top" class="top-bar do-not-print">
<nav class="meta-navigation container" role="navigation">
<div class="skip-link screen-reader-text">
<a href="#content" title="Skip to content">Skip to content</a>
</div>
<a id="close-python-network" class="jump-link" href="#python-network" aria-hidden="true">
<span aria-hidden="true" class="icon-arrow-down"><span>▼</span></span> Close
</a>
<ul class="menu" role="tree">
<li class="python-meta current_item selectedcurrent_branch selected">
<a href="/" title="The Python Programming Language" class="current_item selectedcurrent_branch selected">Python</a>
</li>
<li class="psf-meta ">
<a href="/psf-landing/" title="The Python Software Foundation" >PSF</a>
</li>
<li class="docs-meta ">
<a href="https://docs.python.org" title="Python Documentation" >Docs</a>
</li>
<li class="pypi-meta ">
<a href="https://pypi.org/" title="Python Package Index" >PyPI</a>
</li>
<li class="jobs-meta ">
<a href="/jobs/" title="Python Job Board" >Jobs</a>
</li>
<li class="shop-meta ">
<a href="/community-landing/" >Community</a>
</li>
</ul>
<a id="python-network" class="jump-link" href="#top" aria-hidden="true">
<span aria-hidden="true" class="icon-arrow-up"><span>▲</span></span> The Python Network
</a>
</nav>
</div>
<!-- Header elements -->
<header class="main-header" role="banner">
<div class="container">
<h1 class="site-headline">
<a href="/"><img class="python-logo" src="/static/img/python-logo.png" alt="python™"></a>
</h1>
<div class="options-bar-container do-not-print">
<a href="https://psfmember.org/civicrm/contribute/transact?reset=1&id=2" class="donate-button">Donate</a>
<div class="options-bar">
<a id="site-map-link" class="jump-to-menu" href="#site-map"><span class="menu-icon">≡</span> Menu</a><form class="search-the-site" action="/search/" method="get">
<fieldset title="Search Python.org">
<span aria-hidden="true" class="icon-search"></span>
<label class="screen-reader-text" for="id-search-field">Search This Site</label>
<input id="id-search-field" name="q" type="search" role="textbox" class="search-field" placeholder="Search" value="" tabindex="1">
<button type="submit" name="submit" id="submit" class="search-button" title="Submit this Search" tabindex="3">
GO
</button>
<!--[if IE]><input type="text" style="display: none;" disabled="disabled" size="1" tabindex="4"><![endif]-->
</fieldset>
</form><span class="breaker"></span><div class="adjust-font-size" aria-hidden="true">
<ul class="navigation menu" aria-label="Adjust Text Size on Page">
<li class="tier-1 last" aria-haspopup="true">
<a href="#" class="action-trigger"><strong><small>A</small> A</strong></a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a class="text-shrink" title="Make Text Smaller" href="javascript:;">Smaller</a></li>
<li class="tier-2 element-2" role="treeitem"><a class="text-grow" title="Make Text Larger" href="javascript:;">Larger</a></li>
<li class="tier-2 element-3" role="treeitem"><a class="text-reset" title="Reset any font size changes I have made" href="javascript:;">Reset</a></li>
</ul>
</li>
</ul>
</div><div class="winkwink-nudgenudge">
<ul class="navigation menu" aria-label="Social Media Navigation">
<li class="tier-1 last" aria-haspopup="true">
<a href="#" class="action-trigger">Socialize</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="https://www.facebook.com/pythonlang?fref=ts"><span aria-hidden="true" class="icon-facebook"></span>Facebook</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="https://twitter.com/ThePSF"><span aria-hidden="true" class="icon-twitter"></span>Twitter</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/community/irc/"><span aria-hidden="true" class="icon-freenode"></span>Chat on IRC</a></li>
</ul>
</li>
</ul>
</div>
<span data-html-include="/authenticated"></span>
</div><!-- end options-bar -->
</div>
<nav id="mainnav" class="python-navigation main-navigation do-not-print" role="navigation">
<ul class="navigation menu" role="menubar" aria-label="Main Navigation">
<li id="about" class="tier-1 element-1 " aria-haspopup="true">
<a href="/about/" title="" class="">About</a>
<ul class="subnav menu" role="menu" aria-hidden="true">
<li class="tier-2 element-1" role="treeitem"><a href="/about/apps/" title="">Applications</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/about/quotes/" title="">Quotes</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/about/gettingstarted/" title="">Getting Started</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/about/help/" title="">Help</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="http://brochure.getpython.info/" title="">Python Brochure</a></li>
</ul>
</li>
<li id="downloads" class="tier-1 element-2 " aria-haspopup="true">
<a href="/downloads/" title="" class="">Downloads</a>
<ul class="subnav menu" role="menu" aria-hidden="true">
<li class="tier-2 element-1" role="treeitem"><a href="/downloads/" title="">All releases</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/downloads/source/" title="">Source code</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/downloads/windows/" title="">Windows</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/downloads/macos/" title="">macOS</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/download/other/" title="">Other Platforms</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="https://docs.python.org/3/license.html" title="">License</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="/download/alternatives" title="">Alternative Implementations</a></li>
</ul>
</li>
<li id="documentation" class="tier-1 element-3 " aria-haspopup="true">
<a href="/doc/" title="" class="">Documentation</a>
<ul class="subnav menu" role="menu" aria-hidden="true">
<li class="tier-2 element-1" role="treeitem"><a href="/doc/" title="">Docs</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/doc/av" title="">Audio/Visual Talks</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="https://wiki.python.org/moin/BeginnersGuide" title="">Beginner's Guide</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="https://devguide.python.org/" title="">Developer's Guide</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="https://docs.python.org/faq/" title="">FAQ</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="http://wiki.python.org/moin/Languages" title="">Non-English Docs</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="http://python.org/dev/peps/" title="">PEP Index</a></li>
<li class="tier-2 element-8" role="treeitem"><a href="https://wiki.python.org/moin/PythonBooks" title="">Python Books</a></li>
<li class="tier-2 element-9" role="treeitem"><a href="/doc/essays/" title="">Python Essays</a></li>
</ul>
</li>
<li id="community" class="tier-1 element-4 " aria-haspopup="true">
<a href="/community/" title="" class="">Community</a>
<ul class="subnav menu" role="menu" aria-hidden="true">
<li class="tier-2 element-1" role="treeitem"><a href="/community/survey" title="">Community Survey</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/community/diversity/" title="">Diversity</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/community/lists/" title="">Mailing Lists</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/community/irc/" title="">IRC</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/community/forums/" title="">Forums</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="/psf/annual-report/2021/" title="">PSF Annual Impact Report</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="/community/workshops/" title="">Python Conferences</a></li>
<li class="tier-2 element-8" role="treeitem"><a href="/community/sigs/" title="">Special Interest Groups</a></li>
<li class="tier-2 element-9" role="treeitem"><a href="/community/logos/" title="">Python Logo</a></li>
<li class="tier-2 element-10" role="treeitem"><a href="https://wiki.python.org/moin/" title="">Python Wiki</a></li>
<li class="tier-2 element-11" role="treeitem"><a href="/community/merchandise/" title="">Merchandise</a></li>
<li class="tier-2 element-12" role="treeitem"><a href="/community/awards" title="">Community Awards</a></li>
<li class="tier-2 element-13" role="treeitem"><a href="/psf/conduct/" title="">Code of Conduct</a></li>
<li class="tier-2 element-14" role="treeitem"><a href="/psf/get-involved/" title="">Get Involved</a></li>
<li class="tier-2 element-15" role="treeitem"><a href="/psf/community-stories/" title="">Shared Stories</a></li>
</ul>
</li>
<li id="success-stories" class="tier-1 element-5 " aria-haspopup="true">
<a href="/success-stories/" title="success-stories" class="">Success Stories</a>
<ul class="subnav menu" role="menu" aria-hidden="true">
<li class="tier-2 element-1" role="treeitem"><a href="/success-stories/category/arts/" title="">Arts</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/success-stories/category/business/" title="">Business</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/success-stories/category/education/" title="">Education</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/success-stories/category/engineering/" title="">Engineering</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/success-stories/category/government/" title="">Government</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="/success-stories/category/scientific/" title="">Scientific</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="/success-stories/category/software-development/" title="">Software Development</a></li>
</ul>
</li>
<li id="news" class="tier-1 element-6 " aria-haspopup="true">
<a href="/blogs/" title="News from around the Python world" class="">News</a>
<ul class="subnav menu" role="menu" aria-hidden="true">
<li class="tier-2 element-1" role="treeitem"><a href="/blogs/" title="Python Insider Blog Posts">Python News</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/psf/newsletter/" title="Python Software Foundation Newsletter">PSF Newsletter</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="http://planetpython.org/" title="Planet Python">Community News</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="http://pyfound.blogspot.com/" title="PSF Blog">PSF News</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="http://pycon.blogspot.com/" title="PyCon Blog">PyCon News</a></li>
</ul>
</li>
<li id="events" class="tier-1 element-7 " aria-haspopup="true">
<a href="/events/" title="" class="">Events</a>
<ul class="subnav menu" role="menu" aria-hidden="true">
<li class="tier-2 element-1" role="treeitem"><a href="/events/python-events/" title="">Python Events</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/events/python-user-group/" title="">User Group Events</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/events/python-events/past/" title="">Python Events Archive</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/events/python-user-group/past/" title="">User Group Events Archive</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event" title="">Submit an Event</a></li>
</ul>
</li>
</ul>
</nav>
<div class="header-banner "> <!-- for optional "do-not-print" class -->
<div id="dive-into-python" class="flex-slideshow slideshow">
<ul class="launch-shell menu" id="launch-shell">
<li>
<a class="button prompt" id="start-shell" data-shell-container="#dive-into-python" href="/shell/">>_
<span class="message">Launch Interactive Shell</span>
</a>
</li>
</ul>
<ul class="slides menu">
<li>
<div class="slide-code"><pre><code><span class="comment"># Python 3: Fibonacci series up to n</span>
>>> def fib(n):
>>> a, b = 0, 1
>>> while a < n:
>>> print(a, end=' ')
>>> a, b = b, a+b
>>> print()
>>> fib(1000)
<span class="output">0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987</span></code></pre></div>
<div class="slide-copy"><h1>Functions Defined</h1>
<p>The core of extensible programming is defining functions. Python allows mandatory and optional arguments, keyword arguments, and even arbitrary argument lists. <a href="//docs.python.org/3/tutorial/controlflow.html#defining-functions">More about defining functions in Python 3</a></p></div>
</li>
<li>
<div class="slide-code"><pre><code><span class="comment"># Python 3: List comprehensions</span>
>>> fruits = ['Banana', 'Apple', 'Lime']
>>> loud_fruits = [fruit.upper() for fruit in fruits]
>>> print(loud_fruits)
<span class="output">['BANANA', 'APPLE', 'LIME']</span>
<span class="comment"># List and the enumerate function</span>
>>> list(enumerate(fruits))
<span class="output">[(0, 'Banana'), (1, 'Apple'), (2, 'Lime')]</span></code></pre></div>
<div class="slide-copy"><h1>Compound Data Types</h1>
<p>Lists (known as arrays in other languages) are one of the compound data types that Python understands. Lists can be indexed, sliced and manipulated with other built-in functions. <a href="//docs.python.org/3/tutorial/introduction.html#lists">More about lists in Python 3</a></p></div>
</li>
<li>
<div class="slide-code"><pre><code><span class="comment"># Python 3: Simple arithmetic</span>
>>> 1 / 2
<span class="output">0.5</span>
>>> 2 ** 3
<span class="output">8</span>
>>> 17 / 3 <span class="comment"># classic division returns a float</span>
<span class="output">5.666666666666667</span>
>>> 17 // 3 <span class="comment"># floor division</span>
<span class="output">5</span></code></pre></div>
<div class="slide-copy"><h1>Intuitive Interpretation</h1>
<p>Calculations are simple with Python, and expression syntax is straightforward: the operators <code>+</code>, <code>-</code>, <code>*</code> and <code>/</code> work as expected; parentheses <code>()</code> can be used for grouping. <a href="http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator">More about simple math functions in Python 3</a>.</p></div>
</li>
<li>
<div class="slide-code"><pre><code><span class="comment"># Python 3: Simple output (with Unicode)</span>
>>> print("Hello, I'm Python!")
<span class="output">Hello, I'm Python!</span>
<span class="comment"># Input, assignment</span>
>>> name = input('What is your name?\n')
>>> print('Hi, %s.' % name)
<span class="output">What is your name?
Python
Hi, Python.</span></code></pre></div>
<div class="slide-copy"><h1>Quick & Easy to Learn</h1>
<p>Experienced programmers in any other language can pick up Python very quickly, and beginners find the clean syntax and indentation structure easy to learn. <a href="//docs.python.org/3/tutorial/">Whet your appetite</a> with our Python 3 overview.</p>
</div>
</li>
<li>
<div class="slide-code"><pre><code><span class="comment"># For loop on a list</span>
>>> numbers = [2, 4, 6, 8]
>>> product = 1
>>> for number in numbers:
... product = product * number
...
>>> print('The product is:', product)
<span class="output">The product is: 384</span></code></pre></div>
<div class="slide-copy"><h1>All the Flow You’d Expect</h1>
<p>Python knows the usual control flow statements that other languages speak — <code>if</code>, <code>for</code>, <code>while</code> and <code>range</code> — with some of its own twists, of course. <a href="//docs.python.org/3/tutorial/controlflow.html">More control flow tools in Python 3</a></p></div>
</li>
</ul>
</div>
</div>
<div class="introduction">
<p>Python is a programming language that lets you work quickly <span class="breaker"></span>and integrate systems more effectively. <a class="readmore" href="/doc/">Learn More</a></p>
</div>
</div><!-- end .container -->
</header>
<div id="content" class="content-wrapper">
<!-- Main Content Column -->
<div class="container">
<section class="main-content " role="main">
<div class="notification-bar notification-bar--survey" style="background-color: #ffdf76; color: #664e04; border-color: #004d7a; text-align: center; background-color: #004d7a; color: #fff; padding: 10px; margin: .5em; position: relative; width: 95%; background-color: #ffdf76; color: #664e04; border-color: #004d7a; border-radius: 1em;">
<span class="notification-bar__icon">
<i class="fa fa-chart-line" aria-hidden="true"></i>
</span>
<span class="notification-bar__message">Participate in the official 2021 Python Developers Survey <a class="button button--dark button--small button--primary" style="color: #606060; border-color: #006dad; background-color: #006dad;" href="https://surveys.jetbrains.com/s3/c1-python-developers-survey-2021" target="_blank" rel="noopener">Take the 2021 survey!</a>
</span>
</div>
<div class="row">
<div class="small-widget get-started-widget">
<h2 class="widget-title"><span aria-hidden="true" class="icon-get-started"></span>Get Started</h2>
<p>Whether you're new to programming or an experienced developer, it's easy to learn and use Python.</p>
<p><a href="/about/gettingstarted/">Start with our Beginner’s Guide</a></p>
</div>
<div class="small-widget download-widget">
<h2 class="widget-title"><span aria-hidden="true" class="icon-download"></span>Download</h2>
<p>Python source code and installers are available for download for all versions!</p>
<p>Latest: <a href="/downloads/release/python-3100/">Python 3.10.0</a></p>
</div>
<div class="small-widget documentation-widget">
<h2 class="widget-title"><span aria-hidden="true" class="icon-documentation"></span>Docs</h2>
<p>Documentation for Python's standard library, along with tutorials and guides, are available online.</p>
<p><a href="https://docs.python.org">docs.python.org</a></p>
</div>
<div class="small-widget jobs-widget last">
<h2 class="widget-title"><span aria-hidden="true" class="icon-jobs"></span>Jobs</h2>
<p>Looking for work or have a Python related position that you're trying to hire for? Our <strong>relaunched community-run job board</strong> is the place to go.</p>
<p><a href="//jobs.python.org">jobs.python.org</a></p>
</div>
</div>
<div class="list-widgets row">
<div class="medium-widget blog-widget">
<div class="shrubbery">
<h2 class="widget-title"><span aria-hidden="true" class="icon-news"></span>Latest News</h2>
<p class="give-me-more"><a href="https://blog.python.org" title="More News">More</a></p>
<ul class="menu">
<li>
<time datetime="2021-10-26T08:06:00.000001+00:00"><span class="say-no-more">2021-</span>10-26</time>
<a href="http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/ZDUoSt7NaWc/vicky-twomey-lee-awarded-psf-community.html">Vicky Twomey-Lee Awarded the PSF Community Service Award for Q3 2021</a></li>
<li>
<time datetime="2021-10-19T15:30:00.000001+00:00"><span class="say-no-more">2021-</span>10-19</time>
<a href="http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/T_dCR6_vuA8/announcing-python-software-foundation.html">Announcing Python Software Foundation Fellow Members for Q3 2021! 🎉</a></li>
<li>
<time datetime="2021-10-18T18:16:00+00:00"><span class="say-no-more">2021-</span>10-18</time>
<a href="http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/M9jMg4myXFk/join-python-developers-survey-2021.html">Join the Python Developers Survey 2021: Share and learn about the community</a></li>
<li>
<time datetime="2021-10-07T12:03:00.000003+00:00"><span class="say-no-more">2021-</span>10-07</time>
<a href="http://feedproxy.google.com/~r/PythonInsider/~3/rfZ4c8nXGdk/python-3110a1-is-available.html">Python 3.11.0a1 is available</a></li>
<li>
<time datetime="2021-10-04T21:07:00+00:00"><span class="say-no-more">2021-</span>10-04</time>
<a href="http://feedproxy.google.com/~r/PythonInsider/~3/ojK529j7CAQ/python-3100-is-available.html">Python 3.10.0 is available</a></li>
</ul>
</div><!-- end .shrubbery -->
</div>
<div class="medium-widget event-widget last">
<div class="shrubbery">
<h2 class="widget-title"><span aria-hidden="true" class="icon-calendar"></span>Upcoming Events</h2>
<p class="give-me-more"><a href="/events/calendars/" title="More Events">More</a></p>
<ul class="menu">
<li>
<time datetime="2021-11-05T00:00:00+00:00"><span class="say-no-more">2021-</span>11-05</time>
<a href="/events/python-events/1140/">PyCon Chile</a></li>
<li>
<time datetime="2021-11-13T09:00:00+00:00"><span class="say-no-more">2021-</span>11-13</time>
<a href="/events/python-user-group/1148/">Django Girls Groningen</a></li>
<li>
<time datetime="2021-11-15T00:00:00+00:00"><span class="say-no-more">2021-</span>11-15</time>
<a href="/events/python-events/1154/">PyCon Japan 2021</a></li>
<li>
<time datetime="2021-11-19T00:00:00+00:00"><span class="say-no-more">2021-</span>11-19</time>
<a href="/events/python-events/1104/">PyCon APAC 2021</a></li>
<li>
<time datetime="2021-11-24T00:00:00+00:00"><span class="say-no-more">2021-</span>11-24</time>
<a href="/events/python-events/1044/">Xtreme Python</a></li>
</ul>
</div>
</div>
</div>
<div class="row">
<div class="medium-widget success-stories-widget">
<div class="shrubbery">
<h2 class="widget-title"><span aria-hidden="true" class="icon-success-stories"></span>Success Stories</h2>
<p class="give-me-more"><a href="/success-stories/" title="More Success Stories">More</a></p>
<div class="success-story-item" id="success-story-836">
<blockquote>
<a href="/success-stories/python-seo-link-analyzer/">"Python is all about automating repetitive tasks, leaving more time for your other SEO efforts."</a>
</blockquote>
<table cellpadding="0" cellspacing="0" border="0" width="100%" class="quote-from">
<tbody>
<tr>
<td><p><a href="/success-stories/python-seo-link-analyzer/">Using Python scripts to analyse SEO and broken links on your site</a> <em>by Marnix de Munck</em></p></td>
</tr>
</tbody>
</table>
</div>
</div><!-- end .shrubbery -->
</div>
<div class="medium-widget applications-widget last">
<div class="shrubbery">
<h2 class="widget-title"><span aria-hidden="true" class="icon-python"></span>Use Python for…</h2>
<p class="give-me-more"><a href="/about/apps" title="More Applications">More</a></p>
<ul class="menu">
<li><b>Web Development</b>:
<span class="tag-wrapper"><a class="tag" href="http://www.djangoproject.com/">Django</a>, <a class="tag" href="http://www.pylonsproject.org/">Pyramid</a>, <a class="tag" href="http://bottlepy.org">Bottle</a>, <a class="tag" href="http://tornadoweb.org">Tornado</a>, <a href="http://flask.pocoo.org/" class="tag">Flask</a>, <a class="tag" href="http://www.web2py.com/">web2py</a></span></li>
<li><b>GUI Development</b>:
<span class="tag-wrapper"><a class="tag" href="http://wiki.python.org/moin/TkInter">tkInter</a>, <a class="tag" href="https://wiki.gnome.org/Projects/PyGObject">PyGObject</a>, <a class="tag" href="http://www.riverbankcomputing.co.uk/software/pyqt/intro">PyQt</a>, <a class="tag" href="https://wiki.qt.io/PySide">PySide</a>, <a class="tag" href="https://kivy.org/">Kivy</a>, <a class="tag" href="http://www.wxpython.org/">wxPython</a></span></li>
<li><b>Scientific and Numeric</b>:
<span class="tag-wrapper">
<a class="tag" href="http://www.scipy.org">SciPy</a>, <a class="tag" href="http://pandas.pydata.org/">Pandas</a>, <a href="http://ipython.org" class="tag">IPython</a></span></li>
<li><b>Software Development</b>:
<span class="tag-wrapper"><a class="tag" href="http://buildbot.net/">Buildbot</a>, <a class="tag" href="http://trac.edgewall.org/">Trac</a>, <a class="tag" href="http://roundup.sourceforge.net/">Roundup</a></span></li>
<li><b>System Administration</b>:
<span class="tag-wrapper"><a class="tag" href="http://www.ansible.com">Ansible</a>, <a class="tag" href="http://www.saltstack.com">Salt</a>, <a class="tag" href="https://www.openstack.org">OpenStack</a>, <a class="tag" href="https://xon.sh">xonsh</a></span></li>
</ul>
</div><!-- end .shrubbery -->
</div>
</div>
<div class="pep-widget">
<h2 class="widget-title">
<span class="prompt">>>></span> <a href="/dev/peps/">Python Enhancement Proposals<span class="say-no-more"> (PEPs)</span></a>: The future of Python<span class="say-no-more"> is discussed here.</span>
<a aria-hidden="true" class="rss-link" href="/dev/peps/peps.rss"><span class="icon-feed"></span> RSS</a>
</h2>
</div>
<div class="psf-widget">
<div class="python-logo"></div>
<h2 class="widget-title">
<span class="prompt">>>></span> <a href="/psf/">Python Software Foundation</a>
</h2>
<p>The mission of the Python Software Foundation is to promote, protect, and advance the Python programming language, and to support and facilitate the growth of a diverse and international community of Python programmers. <a class="readmore" href="/psf/">Learn more</a> </p>
<p class="click-these">
<a class="button" href="/users/membership/">Become a Member</a>
<a class="button" href="/psf/donations/">Donate to the PSF</a>
</p>
</div>
</section>
</div><!-- end .container -->
</div><!-- end #content .content-wrapper -->
<!-- Footer and social media list -->
<footer id="site-map" class="main-footer" role="contentinfo">
<div class="main-footer-links">
<div class="container">
<a id="back-to-top-1" class="jump-link" href="#python-network"><span aria-hidden="true" class="icon-arrow-up"><span>▲</span></span> Back to Top</a>
<ul class="sitemap navigation menu do-not-print" role="tree" id="container">
<li class="tier-1 element-1">
<a href="/about/" >About</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/about/apps/" title="">Applications</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/about/quotes/" title="">Quotes</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/about/gettingstarted/" title="">Getting Started</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/about/help/" title="">Help</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="http://brochure.getpython.info/" title="">Python Brochure</a></li>
</ul>
</li>
<li class="tier-1 element-2">
<a href="/downloads/" >Downloads</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/downloads/" title="">All releases</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/downloads/source/" title="">Source code</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/downloads/windows/" title="">Windows</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/downloads/macos/" title="">macOS</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/download/other/" title="">Other Platforms</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="https://docs.python.org/3/license.html" title="">License</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="/download/alternatives" title="">Alternative Implementations</a></li>
</ul>
</li>
<li class="tier-1 element-3">
<a href="/doc/" >Documentation</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/doc/" title="">Docs</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/doc/av" title="">Audio/Visual Talks</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="https://wiki.python.org/moin/BeginnersGuide" title="">Beginner's Guide</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="https://devguide.python.org/" title="">Developer's Guide</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="https://docs.python.org/faq/" title="">FAQ</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="http://wiki.python.org/moin/Languages" title="">Non-English Docs</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="http://python.org/dev/peps/" title="">PEP Index</a></li>
<li class="tier-2 element-8" role="treeitem"><a href="https://wiki.python.org/moin/PythonBooks" title="">Python Books</a></li>
<li class="tier-2 element-9" role="treeitem"><a href="/doc/essays/" title="">Python Essays</a></li>
</ul>
</li>
<li class="tier-1 element-4">
<a href="/community/" >Community</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/community/survey" title="">Community Survey</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/community/diversity/" title="">Diversity</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/community/lists/" title="">Mailing Lists</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/community/irc/" title="">IRC</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/community/forums/" title="">Forums</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="/psf/annual-report/2021/" title="">PSF Annual Impact Report</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="/community/workshops/" title="">Python Conferences</a></li>
<li class="tier-2 element-8" role="treeitem"><a href="/community/sigs/" title="">Special Interest Groups</a></li>
<li class="tier-2 element-9" role="treeitem"><a href="/community/logos/" title="">Python Logo</a></li>
<li class="tier-2 element-10" role="treeitem"><a href="https://wiki.python.org/moin/" title="">Python Wiki</a></li>
<li class="tier-2 element-11" role="treeitem"><a href="/community/merchandise/" title="">Merchandise</a></li>
<li class="tier-2 element-12" role="treeitem"><a href="/community/awards" title="">Community Awards</a></li>
<li class="tier-2 element-13" role="treeitem"><a href="/psf/conduct/" title="">Code of Conduct</a></li>
<li class="tier-2 element-14" role="treeitem"><a href="/psf/get-involved/" title="">Get Involved</a></li>
<li class="tier-2 element-15" role="treeitem"><a href="/psf/community-stories/" title="">Shared Stories</a></li>
</ul>
</li>
<li class="tier-1 element-5">
<a href="/success-stories/" title="success-stories">Success Stories</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/success-stories/category/arts/" title="">Arts</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/success-stories/category/business/" title="">Business</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/success-stories/category/education/" title="">Education</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/success-stories/category/engineering/" title="">Engineering</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/success-stories/category/government/" title="">Government</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="/success-stories/category/scientific/" title="">Scientific</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="/success-stories/category/software-development/" title="">Software Development</a></li>
</ul>
</li>
<li class="tier-1 element-6">
<a href="/blogs/" title="News from around the Python world">News</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/blogs/" title="Python Insider Blog Posts">Python News</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/psf/newsletter/" title="Python Software Foundation Newsletter">PSF Newsletter</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="http://planetpython.org/" title="Planet Python">Community News</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="http://pyfound.blogspot.com/" title="PSF Blog">PSF News</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="http://pycon.blogspot.com/" title="PyCon Blog">PyCon News</a></li>
</ul>
</li>
<li class="tier-1 element-7">
<a href="/events/" >Events</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/events/python-events/" title="">Python Events</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/events/python-user-group/" title="">User Group Events</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/events/python-events/past/" title="">Python Events Archive</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/events/python-user-group/past/" title="">User Group Events Archive</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event" title="">Submit an Event</a></li>
</ul>
</li>
<li class="tier-1 element-8">
<a href="/dev/" >Contributing</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="https://devguide.python.org/" title="">Developer's Guide</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="https://bugs.python.org/" title="">Issue Tracker</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="https://mail.python.org/mailman/listinfo/python-dev" title="">python-dev list</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/dev/core-mentorship/" title="">Core Mentorship</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/dev/security/" title="">Report a Security Issue</a></li>
</ul>
</li>
</ul>
<a id="back-to-top-2" class="jump-link" href="#python-network"><span aria-hidden="true" class="icon-arrow-up"><span>▲</span></span> Back to Top</a>
</div><!-- end .container -->
</div> <!-- end .main-footer-links -->
<div class="site-base">
<div class="container">
<ul class="footer-links navigation menu do-not-print" role="tree">
<li class="tier-1 element-1"><a href="/about/help/">Help & <span class="say-no-more">General</span> Contact</a></li>
<li class="tier-1 element-2"><a href="/community/diversity/">Diversity <span class="say-no-more">Initiatives</span></a></li>
<li class="tier-1 element-3"><a href="https://github.com/python/pythondotorg/issues">Submit Website Bug</a></li>
<li class="tier-1 element-4">
<a href="https://status.python.org/">Status <span class="python-status-indicator-default" id="python-status-indicator"></span></a>
</li>
</ul>
<div class="copyright">
<p><small>
<span class="pre">Copyright ©2001-2021.</span>
<span class="pre"><a href="/psf-landing/">Python Software Foundation</a></span>
<span class="pre"><a href="/about/legal/">Legal Statements</a></span>
<span class="pre"><a href="/privacy/">Privacy Policy</a></span>
<span class="pre"><a href="/psf/sponsorship/sponsors/#heroku">Powered by Heroku</a></span>
</small></p>
</div>
</div><!-- end .container -->
</div><!-- end .site-base -->
</footer>
</div><!-- end #touchnav-wrapper -->
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="/static/js/libs/jquery-1.8.2.min.js"><\/script>')</script>
<script src="//ajax.googleapis.com/ajax/libs/jqueryui/1.12.1/jquery-ui.min.js"></script>
<script>window.jQuery || document.write('<script src="/static/js/libs/jquery-ui-1.12.1.min.js"><\/script>')</script>
<script src="/static/js/libs/masonry.pkgd.min.js"></script>
<script src="/static/js/libs/html-includes.js"></script>
<script type="text/javascript" src="/static/js/main-min.dd72c1659644.js" charset="utf-8"></script>
<!--[if lte IE 7]>
<script type="text/javascript" src="/static/js/plugins/IE8-min.8af6e26c7a3b.js" charset="utf-8"></script>
<![endif]-->
<!--[if lte IE 8]>
<script type="text/javascript" src="/static/js/plugins/getComputedStyle-min.d41d8cd98f00.js" charset="utf-8"></script>
<![endif]-->
</body>
</html>
All you need is a beautiful soup!
Beautiful soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
Please install it in your conda environment:
!conda install -y -c anaconda beautifulsoup4
from bs4 import BeautifulSoup
# beautiful soup takes the source code and a parser as input
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--><html class="no-js" dir="ltr" lang="en"> <!--<![endif]-->
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<link href="//ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js" rel="prefetch"/>
<link href="//ajax.googleapis.com/ajax/libs/jqueryui/1.12.1/jquery-ui.min.js" rel="prefetch"/>
<meta content="Python.org" name="application-name"/>
<meta content="The official home of the Python Programming Language" name="msapplication-tooltip"/>
<meta content="Python.org" name="apple-mobile-web-app-title"/>
<meta content="yes" name="apple-mobile-web-app-capable"/>
<meta content="black" name="apple-mobile-web-app-status-bar-style"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<meta content="True" name="HandheldFriendly"/>
<meta content="telephone=no" name="format-detection"/>
<meta content="on" http-equiv="cleartype"/>
<meta content="false" http-equiv="imagetoolbar"/>
<script src="/static/js/libs/modernizr.js"></script>
<link href="/static/stylesheets/style.7c9ba80a645a.css" media="all" rel="stylesheet" title="default" type="text/css">
<link href="/static/stylesheets/mq.f9187444a4a1.css" media="not print, braille, embossed, speech, tty" rel="stylesheet" type="text/css">
<!--[if (lte IE 8)&(!IEMobile)]>
<link href="/static/stylesheets/no-mq.bf0c425cdb73.css" rel="stylesheet" type="text/css" media="screen" />
<![endif]-->
<link href="//ajax.googleapis.com/ajax/libs/jqueryui/1.12.1/themes/smoothness/jquery-ui.css" rel="stylesheet"/>
<link href="/static/favicon.ico" rel="icon" type="image/x-icon"/>
<link href="/static/apple-touch-icon-144x144-precomposed.png" rel="apple-touch-icon-precomposed" sizes="144x144"/>
<link href="/static/apple-touch-icon-114x114-precomposed.png" rel="apple-touch-icon-precomposed" sizes="114x114"/>
<link href="/static/apple-touch-icon-72x72-precomposed.png" rel="apple-touch-icon-precomposed" sizes="72x72"/>
<link href="/static/apple-touch-icon-precomposed.png" rel="apple-touch-icon-precomposed"/>
<link href="/static/apple-touch-icon-precomposed.png" rel="apple-touch-icon"/>
<meta content="/static/metro-icon-144x144-precomposed.png" name="msapplication-TileImage"/><!-- white shape -->
<meta content="#3673a5" name="msapplication-TileColor"/><!-- python blue -->
<meta content="#3673a5" name="msapplication-navbutton-color"/>
<title>Welcome to Python.org</title>
<meta content="The official home of the Python Programming Language" name="description"/>
<meta content="Python programming language object oriented web free open source software license documentation download community" name="keywords"/>
<meta content="website" property="og:type"/>
<meta content="Python.org" property="og:site_name"/>
<meta content="Welcome to Python.org" property="og:title"/>
<meta content="The official home of the Python Programming Language" property="og:description"/>
<meta content="https://www.python.org/static/opengraph-icon-200x200.png" property="og:image"/>
<meta content="https://www.python.org/static/opengraph-icon-200x200.png" property="og:image:secure_url"/>
<meta content="https://www.python.org/" property="og:url"/>
<link href="/static/humans.txt" rel="author"/>
<link href="https://www.python.org/dev/peps/peps.rss/" rel="alternate" title="Python Enhancement Proposals" type="application/rss+xml"/>
<link href="https://www.python.org/jobs/feed/rss/" rel="alternate" title="Python Job Opportunities" type="application/rss+xml"/>
<link href="https://feeds.feedburner.com/PythonSoftwareFoundationNews" rel="alternate" title="Python Software Foundation News" type="application/rss+xml"/>
<link href="https://feeds.feedburner.com/PythonInsider" rel="alternate" title="Python Insider" type="application/rss+xml"/>
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "WebSite",
"url": "https://www.python.org/",
"potentialAction": {
"@type": "SearchAction",
"target": "https://www.python.org/search/?q={search_term_string}",
"query-input": "required name=search_term_string"
}
}
</script>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-39055973-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</link></link></head>
<body class="python home" id="homepage">
<div id="touchnav-wrapper">
<div class="do-not-print" id="nojs">
<p><strong>Notice:</strong> While JavaScript is not essential for this website, your interaction with the content will be limited. Please turn JavaScript on for the full experience. </p>
</div>
<!--[if lte IE 8]>
<div id="oldie-warning" class="do-not-print">
<p>
<strong>Notice:</strong> Your browser is <em>ancient</em>. Please
<a href="http://browsehappy.com/">upgrade to a different browser</a> to experience a better web.
</p>
</div>
<![endif]-->
<!-- Sister Site Links -->
<div class="top-bar do-not-print" id="top">
<nav class="meta-navigation container" role="navigation">
<div class="skip-link screen-reader-text">
<a href="#content" title="Skip to content">Skip to content</a>
</div>
<a aria-hidden="true" class="jump-link" href="#python-network" id="close-python-network">
<span aria-hidden="true" class="icon-arrow-down"><span>▼</span></span> Close
</a>
<ul class="menu" role="tree">
<li class="python-meta current_item selectedcurrent_branch selected">
<a class="current_item selectedcurrent_branch selected" href="/" title="The Python Programming Language">Python</a>
</li>
<li class="psf-meta">
<a href="/psf-landing/" title="The Python Software Foundation">PSF</a>
</li>
<li class="docs-meta">
<a href="https://docs.python.org" title="Python Documentation">Docs</a>
</li>
<li class="pypi-meta">
<a href="https://pypi.org/" title="Python Package Index">PyPI</a>
</li>
<li class="jobs-meta">
<a href="/jobs/" title="Python Job Board">Jobs</a>
</li>
<li class="shop-meta">
<a href="/community-landing/">Community</a>
</li>
</ul>
<a aria-hidden="true" class="jump-link" href="#top" id="python-network">
<span aria-hidden="true" class="icon-arrow-up"><span>▲</span></span> The Python Network
</a>
</nav>
</div>
<!-- Header elements -->
<header class="main-header" role="banner">
<div class="container">
<h1 class="site-headline">
<a href="/"><img alt="python™" class="python-logo" src="/static/img/python-logo.png"/></a>
</h1>
<div class="options-bar-container do-not-print">
<a class="donate-button" href="https://psfmember.org/civicrm/contribute/transact?reset=1&id=2">Donate</a>
<div class="options-bar">
<a class="jump-to-menu" href="#site-map" id="site-map-link"><span class="menu-icon">≡</span> Menu</a><form action="/search/" class="search-the-site" method="get">
<fieldset title="Search Python.org">
<span aria-hidden="true" class="icon-search"></span>
<label class="screen-reader-text" for="id-search-field">Search This Site</label>
<input class="search-field" id="id-search-field" name="q" placeholder="Search" role="textbox" tabindex="1" type="search" value=""/>
<button class="search-button" id="submit" name="submit" tabindex="3" title="Submit this Search" type="submit">
GO
</button>
<!--[if IE]><input type="text" style="display: none;" disabled="disabled" size="1" tabindex="4"><![endif]-->
</fieldset>
</form><span class="breaker"></span><div aria-hidden="true" class="adjust-font-size">
<ul aria-label="Adjust Text Size on Page" class="navigation menu">
<li aria-haspopup="true" class="tier-1 last">
<a class="action-trigger" href="#"><strong><small>A</small> A</strong></a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a class="text-shrink" href="javascript:;" title="Make Text Smaller">Smaller</a></li>
<li class="tier-2 element-2" role="treeitem"><a class="text-grow" href="javascript:;" title="Make Text Larger">Larger</a></li>
<li class="tier-2 element-3" role="treeitem"><a class="text-reset" href="javascript:;" title="Reset any font size changes I have made">Reset</a></li>
</ul>
</li>
</ul>
</div><div class="winkwink-nudgenudge">
<ul aria-label="Social Media Navigation" class="navigation menu">
<li aria-haspopup="true" class="tier-1 last">
<a class="action-trigger" href="#">Socialize</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="https://www.facebook.com/pythonlang?fref=ts"><span aria-hidden="true" class="icon-facebook"></span>Facebook</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="https://twitter.com/ThePSF"><span aria-hidden="true" class="icon-twitter"></span>Twitter</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/community/irc/"><span aria-hidden="true" class="icon-freenode"></span>Chat on IRC</a></li>
</ul>
</li>
</ul>
</div>
<span data-html-include="/authenticated"></span>
</div><!-- end options-bar -->
</div>
<nav class="python-navigation main-navigation do-not-print" id="mainnav" role="navigation">
<ul aria-label="Main Navigation" class="navigation menu" role="menubar">
<li aria-haspopup="true" class="tier-1 element-1" id="about">
<a class="" href="/about/" title="">About</a>
<ul aria-hidden="true" class="subnav menu" role="menu">
<li class="tier-2 element-1" role="treeitem"><a href="/about/apps/" title="">Applications</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/about/quotes/" title="">Quotes</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/about/gettingstarted/" title="">Getting Started</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/about/help/" title="">Help</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="http://brochure.getpython.info/" title="">Python Brochure</a></li>
</ul>
</li>
<li aria-haspopup="true" class="tier-1 element-2" id="downloads">
<a class="" href="/downloads/" title="">Downloads</a>
<ul aria-hidden="true" class="subnav menu" role="menu">
<li class="tier-2 element-1" role="treeitem"><a href="/downloads/" title="">All releases</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/downloads/source/" title="">Source code</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/downloads/windows/" title="">Windows</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/downloads/macos/" title="">macOS</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/download/other/" title="">Other Platforms</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="https://docs.python.org/3/license.html" title="">License</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="/download/alternatives" title="">Alternative Implementations</a></li>
</ul>
</li>
<li aria-haspopup="true" class="tier-1 element-3" id="documentation">
<a class="" href="/doc/" title="">Documentation</a>
<ul aria-hidden="true" class="subnav menu" role="menu">
<li class="tier-2 element-1" role="treeitem"><a href="/doc/" title="">Docs</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/doc/av" title="">Audio/Visual Talks</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="https://wiki.python.org/moin/BeginnersGuide" title="">Beginner's Guide</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="https://devguide.python.org/" title="">Developer's Guide</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="https://docs.python.org/faq/" title="">FAQ</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="http://wiki.python.org/moin/Languages" title="">Non-English Docs</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="http://python.org/dev/peps/" title="">PEP Index</a></li>
<li class="tier-2 element-8" role="treeitem"><a href="https://wiki.python.org/moin/PythonBooks" title="">Python Books</a></li>
<li class="tier-2 element-9" role="treeitem"><a href="/doc/essays/" title="">Python Essays</a></li>
</ul>
</li>
<li aria-haspopup="true" class="tier-1 element-4" id="community">
<a class="" href="/community/" title="">Community</a>
<ul aria-hidden="true" class="subnav menu" role="menu">
<li class="tier-2 element-1" role="treeitem"><a href="/community/survey" title="">Community Survey</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/community/diversity/" title="">Diversity</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/community/lists/" title="">Mailing Lists</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/community/irc/" title="">IRC</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/community/forums/" title="">Forums</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="/psf/annual-report/2021/" title="">PSF Annual Impact Report</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="/community/workshops/" title="">Python Conferences</a></li>
<li class="tier-2 element-8" role="treeitem"><a href="/community/sigs/" title="">Special Interest Groups</a></li>
<li class="tier-2 element-9" role="treeitem"><a href="/community/logos/" title="">Python Logo</a></li>
<li class="tier-2 element-10" role="treeitem"><a href="https://wiki.python.org/moin/" title="">Python Wiki</a></li>
<li class="tier-2 element-11" role="treeitem"><a href="/community/merchandise/" title="">Merchandise</a></li>
<li class="tier-2 element-12" role="treeitem"><a href="/community/awards" title="">Community Awards</a></li>
<li class="tier-2 element-13" role="treeitem"><a href="/psf/conduct/" title="">Code of Conduct</a></li>
<li class="tier-2 element-14" role="treeitem"><a href="/psf/get-involved/" title="">Get Involved</a></li>
<li class="tier-2 element-15" role="treeitem"><a href="/psf/community-stories/" title="">Shared Stories</a></li>
</ul>
</li>
<li aria-haspopup="true" class="tier-1 element-5" id="success-stories">
<a class="" href="/success-stories/" title="success-stories">Success Stories</a>
<ul aria-hidden="true" class="subnav menu" role="menu">
<li class="tier-2 element-1" role="treeitem"><a href="/success-stories/category/arts/" title="">Arts</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/success-stories/category/business/" title="">Business</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/success-stories/category/education/" title="">Education</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/success-stories/category/engineering/" title="">Engineering</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/success-stories/category/government/" title="">Government</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="/success-stories/category/scientific/" title="">Scientific</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="/success-stories/category/software-development/" title="">Software Development</a></li>
</ul>
</li>
<li aria-haspopup="true" class="tier-1 element-6" id="news">
<a class="" href="/blogs/" title="News from around the Python world">News</a>
<ul aria-hidden="true" class="subnav menu" role="menu">
<li class="tier-2 element-1" role="treeitem"><a href="/blogs/" title="Python Insider Blog Posts">Python News</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/psf/newsletter/" title="Python Software Foundation Newsletter">PSF Newsletter</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="http://planetpython.org/" title="Planet Python">Community News</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="http://pyfound.blogspot.com/" title="PSF Blog">PSF News</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="http://pycon.blogspot.com/" title="PyCon Blog">PyCon News</a></li>
</ul>
</li>
<li aria-haspopup="true" class="tier-1 element-7" id="events">
<a class="" href="/events/" title="">Events</a>
<ul aria-hidden="true" class="subnav menu" role="menu">
<li class="tier-2 element-1" role="treeitem"><a href="/events/python-events/" title="">Python Events</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/events/python-user-group/" title="">User Group Events</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/events/python-events/past/" title="">Python Events Archive</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/events/python-user-group/past/" title="">User Group Events Archive</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event" title="">Submit an Event</a></li>
</ul>
</li>
</ul>
</nav>
<div class="header-banner"> <!-- for optional "do-not-print" class -->
<div class="flex-slideshow slideshow" id="dive-into-python">
<ul class="launch-shell menu" id="launch-shell">
<li>
<a class="button prompt" data-shell-container="#dive-into-python" href="/shell/" id="start-shell">>_
<span class="message">Launch Interactive Shell</span>
</a>
</li>
</ul>
<ul class="slides menu">
<li>
<div class="slide-code"><pre><code><span class="comment"># Python 3: Fibonacci series up to n</span>
>>> def fib(n):
>>> a, b = 0, 1
>>> while a < n:
>>> print(a, end=' ')
>>> a, b = b, a+b
>>> print()
>>> fib(1000)
<span class="output">0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987</span></code></pre></div>
<div class="slide-copy"><h1>Functions Defined</h1>
<p>The core of extensible programming is defining functions. Python allows mandatory and optional arguments, keyword arguments, and even arbitrary argument lists. <a href="//docs.python.org/3/tutorial/controlflow.html#defining-functions">More about defining functions in Python 3</a></p></div>
</li>
<li>
<div class="slide-code"><pre><code><span class="comment"># Python 3: List comprehensions</span>
>>> fruits = ['Banana', 'Apple', 'Lime']
>>> loud_fruits = [fruit.upper() for fruit in fruits]
>>> print(loud_fruits)
<span class="output">['BANANA', 'APPLE', 'LIME']</span>
<span class="comment"># List and the enumerate function</span>
>>> list(enumerate(fruits))
<span class="output">[(0, 'Banana'), (1, 'Apple'), (2, 'Lime')]</span></code></pre></div>
<div class="slide-copy"><h1>Compound Data Types</h1>
<p>Lists (known as arrays in other languages) are one of the compound data types that Python understands. Lists can be indexed, sliced and manipulated with other built-in functions. <a href="//docs.python.org/3/tutorial/introduction.html#lists">More about lists in Python 3</a></p></div>
</li>
<li>
<div class="slide-code"><pre><code><span class="comment"># Python 3: Simple arithmetic</span>
>>> 1 / 2
<span class="output">0.5</span>
>>> 2 ** 3
<span class="output">8</span>
>>> 17 / 3 <span class="comment"># classic division returns a float</span>
<span class="output">5.666666666666667</span>
>>> 17 // 3 <span class="comment"># floor division</span>
<span class="output">5</span></code></pre></div>
<div class="slide-copy"><h1>Intuitive Interpretation</h1>
<p>Calculations are simple with Python, and expression syntax is straightforward: the operators <code>+</code>, <code>-</code>, <code>*</code> and <code>/</code> work as expected; parentheses <code>()</code> can be used for grouping. <a href="http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator">More about simple math functions in Python 3</a>.</p></div>
</li>
<li>
<div class="slide-code"><pre><code><span class="comment"># Python 3: Simple output (with Unicode)</span>
>>> print("Hello, I'm Python!")
<span class="output">Hello, I'm Python!</span>
<span class="comment"># Input, assignment</span>
>>> name = input('What is your name?\n')
>>> print('Hi, %s.' % name)
<span class="output">What is your name?
Python
Hi, Python.</span></code></pre></div>
<div class="slide-copy"><h1>Quick & Easy to Learn</h1>
<p>Experienced programmers in any other language can pick up Python very quickly, and beginners find the clean syntax and indentation structure easy to learn. <a href="//docs.python.org/3/tutorial/">Whet your appetite</a> with our Python 3 overview.</p>
</div>
</li>
<li>
<div class="slide-code"><pre><code><span class="comment"># For loop on a list</span>
>>> numbers = [2, 4, 6, 8]
>>> product = 1
>>> for number in numbers:
... product = product * number
...
>>> print('The product is:', product)
<span class="output">The product is: 384</span></code></pre></div>
<div class="slide-copy"><h1>All the Flow You’d Expect</h1>
<p>Python knows the usual control flow statements that other languages speak — <code>if</code>, <code>for</code>, <code>while</code> and <code>range</code> — with some of its own twists, of course. <a href="//docs.python.org/3/tutorial/controlflow.html">More control flow tools in Python 3</a></p></div>
</li>
</ul>
</div>
</div>
<div class="introduction">
<p>Python is a programming language that lets you work quickly <span class="breaker"></span>and integrate systems more effectively. <a class="readmore" href="/doc/">Learn More</a></p>
</div>
</div><!-- end .container -->
</header>
<div class="content-wrapper" id="content">
<!-- Main Content Column -->
<div class="container">
<section class="main-content" role="main">
<div class="notification-bar notification-bar--survey" style="background-color: #ffdf76; color: #664e04; border-color: #004d7a; text-align: center; background-color: #004d7a; color: #fff; padding: 10px; margin: .5em; position: relative; width: 95%; background-color: #ffdf76; color: #664e04; border-color: #004d7a; border-radius: 1em;">
<span class="notification-bar__icon">
<i aria-hidden="true" class="fa fa-chart-line"></i>
</span>
<span class="notification-bar__message">Participate in the official 2021 Python Developers Survey <a class="button button--dark button--small button--primary" href="https://surveys.jetbrains.com/s3/c1-python-developers-survey-2021" rel="noopener" style="color: #606060; border-color: #006dad; background-color: #006dad;" target="_blank">Take the 2021 survey!</a>
</span>
</div>
<div class="row">
<div class="small-widget get-started-widget">
<h2 class="widget-title"><span aria-hidden="true" class="icon-get-started"></span>Get Started</h2>
<p>Whether you're new to programming or an experienced developer, it's easy to learn and use Python.</p>
<p><a href="/about/gettingstarted/">Start with our Beginner’s Guide</a></p>
</div>
<div class="small-widget download-widget">
<h2 class="widget-title"><span aria-hidden="true" class="icon-download"></span>Download</h2>
<p>Python source code and installers are available for download for all versions!</p>
<p>Latest: <a href="/downloads/release/python-3100/">Python 3.10.0</a></p>
</div>
<div class="small-widget documentation-widget">
<h2 class="widget-title"><span aria-hidden="true" class="icon-documentation"></span>Docs</h2>
<p>Documentation for Python's standard library, along with tutorials and guides, are available online.</p>
<p><a href="https://docs.python.org">docs.python.org</a></p>
</div>
<div class="small-widget jobs-widget last">
<h2 class="widget-title"><span aria-hidden="true" class="icon-jobs"></span>Jobs</h2>
<p>Looking for work or have a Python related position that you're trying to hire for? Our <strong>relaunched community-run job board</strong> is the place to go.</p>
<p><a href="//jobs.python.org">jobs.python.org</a></p>
</div>
</div>
<div class="list-widgets row">
<div class="medium-widget blog-widget">
<div class="shrubbery">
<h2 class="widget-title"><span aria-hidden="true" class="icon-news"></span>Latest News</h2>
<p class="give-me-more"><a href="https://blog.python.org" title="More News">More</a></p>
<ul class="menu">
<li>
<time datetime="2021-10-26T08:06:00.000001+00:00"><span class="say-no-more">2021-</span>10-26</time>
<a href="http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/ZDUoSt7NaWc/vicky-twomey-lee-awarded-psf-community.html">Vicky Twomey-Lee Awarded the PSF Community Service Award for Q3 2021</a></li>
<li>
<time datetime="2021-10-19T15:30:00.000001+00:00"><span class="say-no-more">2021-</span>10-19</time>
<a href="http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/T_dCR6_vuA8/announcing-python-software-foundation.html">Announcing Python Software Foundation Fellow Members for Q3 2021! 🎉</a></li>
<li>
<time datetime="2021-10-18T18:16:00+00:00"><span class="say-no-more">2021-</span>10-18</time>
<a href="http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/M9jMg4myXFk/join-python-developers-survey-2021.html">Join the Python Developers Survey 2021: Share and learn about the community</a></li>
<li>
<time datetime="2021-10-07T12:03:00.000003+00:00"><span class="say-no-more">2021-</span>10-07</time>
<a href="http://feedproxy.google.com/~r/PythonInsider/~3/rfZ4c8nXGdk/python-3110a1-is-available.html">Python 3.11.0a1 is available</a></li>
<li>
<time datetime="2021-10-04T21:07:00+00:00"><span class="say-no-more">2021-</span>10-04</time>
<a href="http://feedproxy.google.com/~r/PythonInsider/~3/ojK529j7CAQ/python-3100-is-available.html">Python 3.10.0 is available</a></li>
</ul>
</div><!-- end .shrubbery -->
</div>
<div class="medium-widget event-widget last">
<div class="shrubbery">
<h2 class="widget-title"><span aria-hidden="true" class="icon-calendar"></span>Upcoming Events</h2>
<p class="give-me-more"><a href="/events/calendars/" title="More Events">More</a></p>
<ul class="menu">
<li>
<time datetime="2021-11-05T00:00:00+00:00"><span class="say-no-more">2021-</span>11-05</time>
<a href="/events/python-events/1140/">PyCon Chile</a></li>
<li>
<time datetime="2021-11-13T09:00:00+00:00"><span class="say-no-more">2021-</span>11-13</time>
<a href="/events/python-user-group/1148/">Django Girls Groningen</a></li>
<li>
<time datetime="2021-11-15T00:00:00+00:00"><span class="say-no-more">2021-</span>11-15</time>
<a href="/events/python-events/1154/">PyCon Japan 2021</a></li>
<li>
<time datetime="2021-11-19T00:00:00+00:00"><span class="say-no-more">2021-</span>11-19</time>
<a href="/events/python-events/1104/">PyCon APAC 2021</a></li>
<li>
<time datetime="2021-11-24T00:00:00+00:00"><span class="say-no-more">2021-</span>11-24</time>
<a href="/events/python-events/1044/">Xtreme Python</a></li>
</ul>
</div>
</div>
</div>
<div class="row">
<div class="medium-widget success-stories-widget">
<div class="shrubbery">
<h2 class="widget-title"><span aria-hidden="true" class="icon-success-stories"></span>Success Stories</h2>
<p class="give-me-more"><a href="/success-stories/" title="More Success Stories">More</a></p>
<div class="success-story-item" id="success-story-836">
<blockquote>
<a href="/success-stories/python-seo-link-analyzer/">"Python is all about automating repetitive tasks, leaving more time for your other SEO efforts."</a>
</blockquote>
<table border="0" cellpadding="0" cellspacing="0" class="quote-from" width="100%">
<tbody>
<tr>
<td><p><a href="/success-stories/python-seo-link-analyzer/">Using Python scripts to analyse SEO and broken links on your site</a> <em>by Marnix de Munck</em></p></td>
</tr>
</tbody>
</table>
</div>
</div><!-- end .shrubbery -->
</div>
<div class="medium-widget applications-widget last">
<div class="shrubbery">
<h2 class="widget-title"><span aria-hidden="true" class="icon-python"></span>Use Python for…</h2>
<p class="give-me-more"><a href="/about/apps" title="More Applications">More</a></p>
<ul class="menu">
<li><b>Web Development</b>:
<span class="tag-wrapper"><a class="tag" href="http://www.djangoproject.com/">Django</a>, <a class="tag" href="http://www.pylonsproject.org/">Pyramid</a>, <a class="tag" href="http://bottlepy.org">Bottle</a>, <a class="tag" href="http://tornadoweb.org">Tornado</a>, <a class="tag" href="http://flask.pocoo.org/">Flask</a>, <a class="tag" href="http://www.web2py.com/">web2py</a></span></li>
<li><b>GUI Development</b>:
<span class="tag-wrapper"><a class="tag" href="http://wiki.python.org/moin/TkInter">tkInter</a>, <a class="tag" href="https://wiki.gnome.org/Projects/PyGObject">PyGObject</a>, <a class="tag" href="http://www.riverbankcomputing.co.uk/software/pyqt/intro">PyQt</a>, <a class="tag" href="https://wiki.qt.io/PySide">PySide</a>, <a class="tag" href="https://kivy.org/">Kivy</a>, <a class="tag" href="http://www.wxpython.org/">wxPython</a></span></li>
<li><b>Scientific and Numeric</b>:
<span class="tag-wrapper">
<a class="tag" href="http://www.scipy.org">SciPy</a>, <a class="tag" href="http://pandas.pydata.org/">Pandas</a>, <a class="tag" href="http://ipython.org">IPython</a></span></li>
<li><b>Software Development</b>:
<span class="tag-wrapper"><a class="tag" href="http://buildbot.net/">Buildbot</a>, <a class="tag" href="http://trac.edgewall.org/">Trac</a>, <a class="tag" href="http://roundup.sourceforge.net/">Roundup</a></span></li>
<li><b>System Administration</b>:
<span class="tag-wrapper"><a class="tag" href="http://www.ansible.com">Ansible</a>, <a class="tag" href="http://www.saltstack.com">Salt</a>, <a class="tag" href="https://www.openstack.org">OpenStack</a>, <a class="tag" href="https://xon.sh">xonsh</a></span></li>
</ul>
</div><!-- end .shrubbery -->
</div>
</div>
<div class="pep-widget">
<h2 class="widget-title">
<span class="prompt">>>></span> <a href="/dev/peps/">Python Enhancement Proposals<span class="say-no-more"> (PEPs)</span></a>: The future of Python<span class="say-no-more"> is discussed here.</span>
<a aria-hidden="true" class="rss-link" href="/dev/peps/peps.rss"><span class="icon-feed"></span> RSS</a>
</h2>
</div>
<div class="psf-widget">
<div class="python-logo"></div>
<h2 class="widget-title">
<span class="prompt">>>></span> <a href="/psf/">Python Software Foundation</a>
</h2>
<p>The mission of the Python Software Foundation is to promote, protect, and advance the Python programming language, and to support and facilitate the growth of a diverse and international community of Python programmers. <a class="readmore" href="/psf/">Learn more</a> </p>
<p class="click-these">
<a class="button" href="/users/membership/">Become a Member</a>
<a class="button" href="/psf/donations/">Donate to the PSF</a>
</p>
</div>
</section>
</div><!-- end .container -->
</div><!-- end #content .content-wrapper -->
<!-- Footer and social media list -->
<footer class="main-footer" id="site-map" role="contentinfo">
<div class="main-footer-links">
<div class="container">
<a class="jump-link" href="#python-network" id="back-to-top-1"><span aria-hidden="true" class="icon-arrow-up"><span>▲</span></span> Back to Top</a>
<ul class="sitemap navigation menu do-not-print" id="container" role="tree">
<li class="tier-1 element-1">
<a href="/about/">About</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/about/apps/" title="">Applications</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/about/quotes/" title="">Quotes</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/about/gettingstarted/" title="">Getting Started</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/about/help/" title="">Help</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="http://brochure.getpython.info/" title="">Python Brochure</a></li>
</ul>
</li>
<li class="tier-1 element-2">
<a href="/downloads/">Downloads</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/downloads/" title="">All releases</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/downloads/source/" title="">Source code</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/downloads/windows/" title="">Windows</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/downloads/macos/" title="">macOS</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/download/other/" title="">Other Platforms</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="https://docs.python.org/3/license.html" title="">License</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="/download/alternatives" title="">Alternative Implementations</a></li>
</ul>
</li>
<li class="tier-1 element-3">
<a href="/doc/">Documentation</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/doc/" title="">Docs</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/doc/av" title="">Audio/Visual Talks</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="https://wiki.python.org/moin/BeginnersGuide" title="">Beginner's Guide</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="https://devguide.python.org/" title="">Developer's Guide</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="https://docs.python.org/faq/" title="">FAQ</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="http://wiki.python.org/moin/Languages" title="">Non-English Docs</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="http://python.org/dev/peps/" title="">PEP Index</a></li>
<li class="tier-2 element-8" role="treeitem"><a href="https://wiki.python.org/moin/PythonBooks" title="">Python Books</a></li>
<li class="tier-2 element-9" role="treeitem"><a href="/doc/essays/" title="">Python Essays</a></li>
</ul>
</li>
<li class="tier-1 element-4">
<a href="/community/">Community</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/community/survey" title="">Community Survey</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/community/diversity/" title="">Diversity</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/community/lists/" title="">Mailing Lists</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/community/irc/" title="">IRC</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/community/forums/" title="">Forums</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="/psf/annual-report/2021/" title="">PSF Annual Impact Report</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="/community/workshops/" title="">Python Conferences</a></li>
<li class="tier-2 element-8" role="treeitem"><a href="/community/sigs/" title="">Special Interest Groups</a></li>
<li class="tier-2 element-9" role="treeitem"><a href="/community/logos/" title="">Python Logo</a></li>
<li class="tier-2 element-10" role="treeitem"><a href="https://wiki.python.org/moin/" title="">Python Wiki</a></li>
<li class="tier-2 element-11" role="treeitem"><a href="/community/merchandise/" title="">Merchandise</a></li>
<li class="tier-2 element-12" role="treeitem"><a href="/community/awards" title="">Community Awards</a></li>
<li class="tier-2 element-13" role="treeitem"><a href="/psf/conduct/" title="">Code of Conduct</a></li>
<li class="tier-2 element-14" role="treeitem"><a href="/psf/get-involved/" title="">Get Involved</a></li>
<li class="tier-2 element-15" role="treeitem"><a href="/psf/community-stories/" title="">Shared Stories</a></li>
</ul>
</li>
<li class="tier-1 element-5">
<a href="/success-stories/" title="success-stories">Success Stories</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/success-stories/category/arts/" title="">Arts</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/success-stories/category/business/" title="">Business</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/success-stories/category/education/" title="">Education</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/success-stories/category/engineering/" title="">Engineering</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/success-stories/category/government/" title="">Government</a></li>
<li class="tier-2 element-6" role="treeitem"><a href="/success-stories/category/scientific/" title="">Scientific</a></li>
<li class="tier-2 element-7" role="treeitem"><a href="/success-stories/category/software-development/" title="">Software Development</a></li>
</ul>
</li>
<li class="tier-1 element-6">
<a href="/blogs/" title="News from around the Python world">News</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/blogs/" title="Python Insider Blog Posts">Python News</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/psf/newsletter/" title="Python Software Foundation Newsletter">PSF Newsletter</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="http://planetpython.org/" title="Planet Python">Community News</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="http://pyfound.blogspot.com/" title="PSF Blog">PSF News</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="http://pycon.blogspot.com/" title="PyCon Blog">PyCon News</a></li>
</ul>
</li>
<li class="tier-1 element-7">
<a href="/events/">Events</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="/events/python-events/" title="">Python Events</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="/events/python-user-group/" title="">User Group Events</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="/events/python-events/past/" title="">Python Events Archive</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/events/python-user-group/past/" title="">User Group Events Archive</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event" title="">Submit an Event</a></li>
</ul>
</li>
<li class="tier-1 element-8">
<a href="/dev/">Contributing</a>
<ul class="subnav menu">
<li class="tier-2 element-1" role="treeitem"><a href="https://devguide.python.org/" title="">Developer's Guide</a></li>
<li class="tier-2 element-2" role="treeitem"><a href="https://bugs.python.org/" title="">Issue Tracker</a></li>
<li class="tier-2 element-3" role="treeitem"><a href="https://mail.python.org/mailman/listinfo/python-dev" title="">python-dev list</a></li>
<li class="tier-2 element-4" role="treeitem"><a href="/dev/core-mentorship/" title="">Core Mentorship</a></li>
<li class="tier-2 element-5" role="treeitem"><a href="/dev/security/" title="">Report a Security Issue</a></li>
</ul>
</li>
</ul>
<a class="jump-link" href="#python-network" id="back-to-top-2"><span aria-hidden="true" class="icon-arrow-up"><span>▲</span></span> Back to Top</a>
</div><!-- end .container -->
</div> <!-- end .main-footer-links -->
<div class="site-base">
<div class="container">
<ul class="footer-links navigation menu do-not-print" role="tree">
<li class="tier-1 element-1"><a href="/about/help/">Help & <span class="say-no-more">General</span> Contact</a></li>
<li class="tier-1 element-2"><a href="/community/diversity/">Diversity <span class="say-no-more">Initiatives</span></a></li>
<li class="tier-1 element-3"><a href="https://github.com/python/pythondotorg/issues">Submit Website Bug</a></li>
<li class="tier-1 element-4">
<a href="https://status.python.org/">Status <span class="python-status-indicator-default" id="python-status-indicator"></span></a>
</li>
</ul>
<div class="copyright">
<p><small>
<span class="pre">Copyright ©2001-2021.</span>
<span class="pre"><a href="/psf-landing/">Python Software Foundation</a></span>
<span class="pre"><a href="/about/legal/">Legal Statements</a></span>
<span class="pre"><a href="/privacy/">Privacy Policy</a></span>
<span class="pre"><a href="/psf/sponsorship/sponsors/#heroku">Powered by Heroku</a></span>
</small></p>
</div>
</div><!-- end .container -->
</div><!-- end .site-base -->
</footer>
</div><!-- end #touchnav-wrapper -->
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="/static/js/libs/jquery-1.8.2.min.js"><\/script>')</script>
<script src="//ajax.googleapis.com/ajax/libs/jqueryui/1.12.1/jquery-ui.min.js"></script>
<script>window.jQuery || document.write('<script src="/static/js/libs/jquery-ui-1.12.1.min.js"><\/script>')</script>
<script src="/static/js/libs/masonry.pkgd.min.js"></script>
<script src="/static/js/libs/html-includes.js"></script>
<script charset="utf-8" src="/static/js/main-min.dd72c1659644.js" type="text/javascript"></script>
<!--[if lte IE 7]>
<script type="text/javascript" src="/static/js/plugins/IE8-min.8af6e26c7a3b.js" charset="utf-8"></script>
<![endif]-->
<!--[if lte IE 8]>
<script type="text/javascript" src="/static/js/plugins/getComputedStyle-min.d41d8cd98f00.js" charset="utf-8"></script>
<![endif]-->
</body>
</html>
Much prettier, eh?
These HTML tags are exactly what you see when you press F12 on a webpage. more specifically, when you right click and inspect an element in a webpage, you can see which tag it belongs to! try it for yourself! In www.python.org, inspecting Community should look like this:
We can see which tag this element belongs to (<a> inside a <li> tag). We can also see its attributes (the link (href) it goes to) and its text value ('Community').
Beautiful soup allows you to search through the source code by tag names and their attributes. The code below finds the first <a> tag which satisfies the given conditions.
# Can add any attributes we want to the function
print(soup.find('a', href="/community-landing/"))
print(soup.find('a', title="Skip to content", text='Skip to content'))
<a href="/community-landing/">Community</a> <a href="#content" title="Skip to content">Skip to content</a>
What if we need to find all elements that satisfy a condition?
a = soup.findAll('a')
a
[<a href="#content" title="Skip to content">Skip to content</a>,
<a aria-hidden="true" class="jump-link" href="#python-network" id="close-python-network">
<span aria-hidden="true" class="icon-arrow-down"><span>▼</span></span> Close
</a>,
<a class="current_item selectedcurrent_branch selected" href="/" title="The Python Programming Language">Python</a>,
<a href="/psf-landing/" title="The Python Software Foundation">PSF</a>,
<a href="https://docs.python.org" title="Python Documentation">Docs</a>,
<a href="https://pypi.org/" title="Python Package Index">PyPI</a>,
<a href="/jobs/" title="Python Job Board">Jobs</a>,
<a href="/community-landing/">Community</a>,
<a aria-hidden="true" class="jump-link" href="#top" id="python-network">
<span aria-hidden="true" class="icon-arrow-up"><span>▲</span></span> The Python Network
</a>,
<a href="/"><img alt="python™" class="python-logo" src="/static/img/python-logo.png"/></a>,
<a class="donate-button" href="https://psfmember.org/civicrm/contribute/transact?reset=1&id=2">Donate</a>,
<a class="jump-to-menu" href="#site-map" id="site-map-link"><span class="menu-icon">≡</span> Menu</a>,
<a class="action-trigger" href="#"><strong><small>A</small> A</strong></a>,
<a class="text-shrink" href="javascript:;" title="Make Text Smaller">Smaller</a>,
<a class="text-grow" href="javascript:;" title="Make Text Larger">Larger</a>,
<a class="text-reset" href="javascript:;" title="Reset any font size changes I have made">Reset</a>,
<a class="action-trigger" href="#">Socialize</a>,
<a href="https://www.facebook.com/pythonlang?fref=ts"><span aria-hidden="true" class="icon-facebook"></span>Facebook</a>,
<a href="https://twitter.com/ThePSF"><span aria-hidden="true" class="icon-twitter"></span>Twitter</a>,
<a href="/community/irc/"><span aria-hidden="true" class="icon-freenode"></span>Chat on IRC</a>,
<a class="" href="/about/" title="">About</a>,
<a href="/about/apps/" title="">Applications</a>,
<a href="/about/quotes/" title="">Quotes</a>,
<a href="/about/gettingstarted/" title="">Getting Started</a>,
<a href="/about/help/" title="">Help</a>,
<a href="http://brochure.getpython.info/" title="">Python Brochure</a>,
<a class="" href="/downloads/" title="">Downloads</a>,
<a href="/downloads/" title="">All releases</a>,
<a href="/downloads/source/" title="">Source code</a>,
<a href="/downloads/windows/" title="">Windows</a>,
<a href="/downloads/macos/" title="">macOS</a>,
<a href="/download/other/" title="">Other Platforms</a>,
<a href="https://docs.python.org/3/license.html" title="">License</a>,
<a href="/download/alternatives" title="">Alternative Implementations</a>,
<a class="" href="/doc/" title="">Documentation</a>,
<a href="/doc/" title="">Docs</a>,
<a href="/doc/av" title="">Audio/Visual Talks</a>,
<a href="https://wiki.python.org/moin/BeginnersGuide" title="">Beginner's Guide</a>,
<a href="https://devguide.python.org/" title="">Developer's Guide</a>,
<a href="https://docs.python.org/faq/" title="">FAQ</a>,
<a href="http://wiki.python.org/moin/Languages" title="">Non-English Docs</a>,
<a href="http://python.org/dev/peps/" title="">PEP Index</a>,
<a href="https://wiki.python.org/moin/PythonBooks" title="">Python Books</a>,
<a href="/doc/essays/" title="">Python Essays</a>,
<a class="" href="/community/" title="">Community</a>,
<a href="/community/survey" title="">Community Survey</a>,
<a href="/community/diversity/" title="">Diversity</a>,
<a href="/community/lists/" title="">Mailing Lists</a>,
<a href="/community/irc/" title="">IRC</a>,
<a href="/community/forums/" title="">Forums</a>,
<a href="/psf/annual-report/2021/" title="">PSF Annual Impact Report</a>,
<a href="/community/workshops/" title="">Python Conferences</a>,
<a href="/community/sigs/" title="">Special Interest Groups</a>,
<a href="/community/logos/" title="">Python Logo</a>,
<a href="https://wiki.python.org/moin/" title="">Python Wiki</a>,
<a href="/community/merchandise/" title="">Merchandise</a>,
<a href="/community/awards" title="">Community Awards</a>,
<a href="/psf/conduct/" title="">Code of Conduct</a>,
<a href="/psf/get-involved/" title="">Get Involved</a>,
<a href="/psf/community-stories/" title="">Shared Stories</a>,
<a class="" href="/success-stories/" title="success-stories">Success Stories</a>,
<a href="/success-stories/category/arts/" title="">Arts</a>,
<a href="/success-stories/category/business/" title="">Business</a>,
<a href="/success-stories/category/education/" title="">Education</a>,
<a href="/success-stories/category/engineering/" title="">Engineering</a>,
<a href="/success-stories/category/government/" title="">Government</a>,
<a href="/success-stories/category/scientific/" title="">Scientific</a>,
<a href="/success-stories/category/software-development/" title="">Software Development</a>,
<a class="" href="/blogs/" title="News from around the Python world">News</a>,
<a href="/blogs/" title="Python Insider Blog Posts">Python News</a>,
<a href="/psf/newsletter/" title="Python Software Foundation Newsletter">PSF Newsletter</a>,
<a href="http://planetpython.org/" title="Planet Python">Community News</a>,
<a href="http://pyfound.blogspot.com/" title="PSF Blog">PSF News</a>,
<a href="http://pycon.blogspot.com/" title="PyCon Blog">PyCon News</a>,
<a class="" href="/events/" title="">Events</a>,
<a href="/events/python-events/" title="">Python Events</a>,
<a href="/events/python-user-group/" title="">User Group Events</a>,
<a href="/events/python-events/past/" title="">Python Events Archive</a>,
<a href="/events/python-user-group/past/" title="">User Group Events Archive</a>,
<a href="https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event" title="">Submit an Event</a>,
<a class="button prompt" data-shell-container="#dive-into-python" href="/shell/" id="start-shell">>_
<span class="message">Launch Interactive Shell</span>
</a>,
<a href="//docs.python.org/3/tutorial/controlflow.html#defining-functions">More about defining functions in Python 3</a>,
<a href="//docs.python.org/3/tutorial/introduction.html#lists">More about lists in Python 3</a>,
<a href="http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator">More about simple math functions in Python 3</a>,
<a href="//docs.python.org/3/tutorial/">Whet your appetite</a>,
<a href="//docs.python.org/3/tutorial/controlflow.html">More control flow tools in Python 3</a>,
<a class="readmore" href="/doc/">Learn More</a>,
<a class="button button--dark button--small button--primary" href="https://surveys.jetbrains.com/s3/c1-python-developers-survey-2021" rel="noopener" style="color: #606060; border-color: #006dad; background-color: #006dad;" target="_blank">Take the 2021 survey!</a>,
<a href="/about/gettingstarted/">Start with our Beginner’s Guide</a>,
<a href="/downloads/release/python-3100/">Python 3.10.0</a>,
<a href="https://docs.python.org">docs.python.org</a>,
<a href="//jobs.python.org">jobs.python.org</a>,
<a href="https://blog.python.org" title="More News">More</a>,
<a href="http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/ZDUoSt7NaWc/vicky-twomey-lee-awarded-psf-community.html">Vicky Twomey-Lee Awarded the PSF Community Service Award for Q3 2021</a>,
<a href="http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/T_dCR6_vuA8/announcing-python-software-foundation.html">Announcing Python Software Foundation Fellow Members for Q3 2021! 🎉</a>,
<a href="http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/M9jMg4myXFk/join-python-developers-survey-2021.html">Join the Python Developers Survey 2021: Share and learn about the community</a>,
<a href="http://feedproxy.google.com/~r/PythonInsider/~3/rfZ4c8nXGdk/python-3110a1-is-available.html">Python 3.11.0a1 is available</a>,
<a href="http://feedproxy.google.com/~r/PythonInsider/~3/ojK529j7CAQ/python-3100-is-available.html">Python 3.10.0 is available</a>,
<a href="/events/calendars/" title="More Events">More</a>,
<a href="/events/python-events/1140/">PyCon Chile</a>,
<a href="/events/python-user-group/1148/">Django Girls Groningen</a>,
<a href="/events/python-events/1154/">PyCon Japan 2021</a>,
<a href="/events/python-events/1104/">PyCon APAC 2021</a>,
<a href="/events/python-events/1044/">Xtreme Python</a>,
<a href="/success-stories/" title="More Success Stories">More</a>,
<a href="/success-stories/python-seo-link-analyzer/">"Python is all about automating repetitive tasks, leaving more time for your other SEO efforts."</a>,
<a href="/success-stories/python-seo-link-analyzer/">Using Python scripts to analyse SEO and broken links on your site</a>,
<a href="/about/apps" title="More Applications">More</a>,
<a class="tag" href="http://www.djangoproject.com/">Django</a>,
<a class="tag" href="http://www.pylonsproject.org/">Pyramid</a>,
<a class="tag" href="http://bottlepy.org">Bottle</a>,
<a class="tag" href="http://tornadoweb.org">Tornado</a>,
<a class="tag" href="http://flask.pocoo.org/">Flask</a>,
<a class="tag" href="http://www.web2py.com/">web2py</a>,
<a class="tag" href="http://wiki.python.org/moin/TkInter">tkInter</a>,
<a class="tag" href="https://wiki.gnome.org/Projects/PyGObject">PyGObject</a>,
<a class="tag" href="http://www.riverbankcomputing.co.uk/software/pyqt/intro">PyQt</a>,
<a class="tag" href="https://wiki.qt.io/PySide">PySide</a>,
<a class="tag" href="https://kivy.org/">Kivy</a>,
<a class="tag" href="http://www.wxpython.org/">wxPython</a>,
<a class="tag" href="http://www.scipy.org">SciPy</a>,
<a class="tag" href="http://pandas.pydata.org/">Pandas</a>,
<a class="tag" href="http://ipython.org">IPython</a>,
<a class="tag" href="http://buildbot.net/">Buildbot</a>,
<a class="tag" href="http://trac.edgewall.org/">Trac</a>,
<a class="tag" href="http://roundup.sourceforge.net/">Roundup</a>,
<a class="tag" href="http://www.ansible.com">Ansible</a>,
<a class="tag" href="http://www.saltstack.com">Salt</a>,
<a class="tag" href="https://www.openstack.org">OpenStack</a>,
<a class="tag" href="https://xon.sh">xonsh</a>,
<a href="/dev/peps/">Python Enhancement Proposals<span class="say-no-more"> (PEPs)</span></a>,
<a aria-hidden="true" class="rss-link" href="/dev/peps/peps.rss"><span class="icon-feed"></span> RSS</a>,
<a href="/psf/">Python Software Foundation</a>,
<a class="readmore" href="/psf/">Learn more</a>,
<a class="button" href="/users/membership/">Become a Member</a>,
<a class="button" href="/psf/donations/">Donate to the PSF</a>,
<a class="jump-link" href="#python-network" id="back-to-top-1"><span aria-hidden="true" class="icon-arrow-up"><span>▲</span></span> Back to Top</a>,
<a href="/about/">About</a>,
<a href="/about/apps/" title="">Applications</a>,
<a href="/about/quotes/" title="">Quotes</a>,
<a href="/about/gettingstarted/" title="">Getting Started</a>,
<a href="/about/help/" title="">Help</a>,
<a href="http://brochure.getpython.info/" title="">Python Brochure</a>,
<a href="/downloads/">Downloads</a>,
<a href="/downloads/" title="">All releases</a>,
<a href="/downloads/source/" title="">Source code</a>,
<a href="/downloads/windows/" title="">Windows</a>,
<a href="/downloads/macos/" title="">macOS</a>,
<a href="/download/other/" title="">Other Platforms</a>,
<a href="https://docs.python.org/3/license.html" title="">License</a>,
<a href="/download/alternatives" title="">Alternative Implementations</a>,
<a href="/doc/">Documentation</a>,
<a href="/doc/" title="">Docs</a>,
<a href="/doc/av" title="">Audio/Visual Talks</a>,
<a href="https://wiki.python.org/moin/BeginnersGuide" title="">Beginner's Guide</a>,
<a href="https://devguide.python.org/" title="">Developer's Guide</a>,
<a href="https://docs.python.org/faq/" title="">FAQ</a>,
<a href="http://wiki.python.org/moin/Languages" title="">Non-English Docs</a>,
<a href="http://python.org/dev/peps/" title="">PEP Index</a>,
<a href="https://wiki.python.org/moin/PythonBooks" title="">Python Books</a>,
<a href="/doc/essays/" title="">Python Essays</a>,
<a href="/community/">Community</a>,
<a href="/community/survey" title="">Community Survey</a>,
<a href="/community/diversity/" title="">Diversity</a>,
<a href="/community/lists/" title="">Mailing Lists</a>,
<a href="/community/irc/" title="">IRC</a>,
<a href="/community/forums/" title="">Forums</a>,
<a href="/psf/annual-report/2021/" title="">PSF Annual Impact Report</a>,
<a href="/community/workshops/" title="">Python Conferences</a>,
<a href="/community/sigs/" title="">Special Interest Groups</a>,
<a href="/community/logos/" title="">Python Logo</a>,
<a href="https://wiki.python.org/moin/" title="">Python Wiki</a>,
<a href="/community/merchandise/" title="">Merchandise</a>,
<a href="/community/awards" title="">Community Awards</a>,
<a href="/psf/conduct/" title="">Code of Conduct</a>,
<a href="/psf/get-involved/" title="">Get Involved</a>,
<a href="/psf/community-stories/" title="">Shared Stories</a>,
<a href="/success-stories/" title="success-stories">Success Stories</a>,
<a href="/success-stories/category/arts/" title="">Arts</a>,
<a href="/success-stories/category/business/" title="">Business</a>,
<a href="/success-stories/category/education/" title="">Education</a>,
<a href="/success-stories/category/engineering/" title="">Engineering</a>,
<a href="/success-stories/category/government/" title="">Government</a>,
<a href="/success-stories/category/scientific/" title="">Scientific</a>,
<a href="/success-stories/category/software-development/" title="">Software Development</a>,
<a href="/blogs/" title="News from around the Python world">News</a>,
<a href="/blogs/" title="Python Insider Blog Posts">Python News</a>,
<a href="/psf/newsletter/" title="Python Software Foundation Newsletter">PSF Newsletter</a>,
<a href="http://planetpython.org/" title="Planet Python">Community News</a>,
<a href="http://pyfound.blogspot.com/" title="PSF Blog">PSF News</a>,
<a href="http://pycon.blogspot.com/" title="PyCon Blog">PyCon News</a>,
<a href="/events/">Events</a>,
<a href="/events/python-events/" title="">Python Events</a>,
<a href="/events/python-user-group/" title="">User Group Events</a>,
<a href="/events/python-events/past/" title="">Python Events Archive</a>,
<a href="/events/python-user-group/past/" title="">User Group Events Archive</a>,
<a href="https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event" title="">Submit an Event</a>,
<a href="/dev/">Contributing</a>,
<a href="https://devguide.python.org/" title="">Developer's Guide</a>,
<a href="https://bugs.python.org/" title="">Issue Tracker</a>,
<a href="https://mail.python.org/mailman/listinfo/python-dev" title="">python-dev list</a>,
<a href="/dev/core-mentorship/" title="">Core Mentorship</a>,
<a href="/dev/security/" title="">Report a Security Issue</a>,
<a class="jump-link" href="#python-network" id="back-to-top-2"><span aria-hidden="true" class="icon-arrow-up"><span>▲</span></span> Back to Top</a>,
<a href="/about/help/">Help & <span class="say-no-more">General</span> Contact</a>,
<a href="/community/diversity/">Diversity <span class="say-no-more">Initiatives</span></a>,
<a href="https://github.com/python/pythondotorg/issues">Submit Website Bug</a>,
<a href="https://status.python.org/">Status <span class="python-status-indicator-default" id="python-status-indicator"></span></a>,
<a href="/psf-landing/">Python Software Foundation</a>,
<a href="/about/legal/">Legal Statements</a>,
<a href="/privacy/">Privacy Policy</a>,
<a href="/psf/sponsorship/sponsors/#heroku">Powered by Heroku</a>]
What if we want to access their attributes?
for i in a:
print(i['href'])
#content #python-network / /psf-landing/ https://docs.python.org https://pypi.org/ /jobs/ /community-landing/ #top / https://psfmember.org/civicrm/contribute/transact?reset=1&id=2 #site-map # javascript:; javascript:; javascript:; # https://www.facebook.com/pythonlang?fref=ts https://twitter.com/ThePSF /community/irc/ /about/ /about/apps/ /about/quotes/ /about/gettingstarted/ /about/help/ http://brochure.getpython.info/ /downloads/ /downloads/ /downloads/source/ /downloads/windows/ /downloads/macos/ /download/other/ https://docs.python.org/3/license.html /download/alternatives /doc/ /doc/ /doc/av https://wiki.python.org/moin/BeginnersGuide https://devguide.python.org/ https://docs.python.org/faq/ http://wiki.python.org/moin/Languages http://python.org/dev/peps/ https://wiki.python.org/moin/PythonBooks /doc/essays/ /community/ /community/survey /community/diversity/ /community/lists/ /community/irc/ /community/forums/ /psf/annual-report/2021/ /community/workshops/ /community/sigs/ /community/logos/ https://wiki.python.org/moin/ /community/merchandise/ /community/awards /psf/conduct/ /psf/get-involved/ /psf/community-stories/ /success-stories/ /success-stories/category/arts/ /success-stories/category/business/ /success-stories/category/education/ /success-stories/category/engineering/ /success-stories/category/government/ /success-stories/category/scientific/ /success-stories/category/software-development/ /blogs/ /blogs/ /psf/newsletter/ http://planetpython.org/ http://pyfound.blogspot.com/ http://pycon.blogspot.com/ /events/ /events/python-events/ /events/python-user-group/ /events/python-events/past/ /events/python-user-group/past/ https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event /shell/ //docs.python.org/3/tutorial/controlflow.html#defining-functions //docs.python.org/3/tutorial/introduction.html#lists http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator //docs.python.org/3/tutorial/ //docs.python.org/3/tutorial/controlflow.html /doc/ https://surveys.jetbrains.com/s3/c1-python-developers-survey-2021 /about/gettingstarted/ /downloads/release/python-3100/ https://docs.python.org //jobs.python.org https://blog.python.org http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/ZDUoSt7NaWc/vicky-twomey-lee-awarded-psf-community.html http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/T_dCR6_vuA8/announcing-python-software-foundation.html http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/M9jMg4myXFk/join-python-developers-survey-2021.html http://feedproxy.google.com/~r/PythonInsider/~3/rfZ4c8nXGdk/python-3110a1-is-available.html http://feedproxy.google.com/~r/PythonInsider/~3/ojK529j7CAQ/python-3100-is-available.html /events/calendars/ /events/python-events/1140/ /events/python-user-group/1148/ /events/python-events/1154/ /events/python-events/1104/ /events/python-events/1044/ /success-stories/ /success-stories/python-seo-link-analyzer/ /success-stories/python-seo-link-analyzer/ /about/apps http://www.djangoproject.com/ http://www.pylonsproject.org/ http://bottlepy.org http://tornadoweb.org http://flask.pocoo.org/ http://www.web2py.com/ http://wiki.python.org/moin/TkInter https://wiki.gnome.org/Projects/PyGObject http://www.riverbankcomputing.co.uk/software/pyqt/intro https://wiki.qt.io/PySide https://kivy.org/ http://www.wxpython.org/ http://www.scipy.org http://pandas.pydata.org/ http://ipython.org http://buildbot.net/ http://trac.edgewall.org/ http://roundup.sourceforge.net/ http://www.ansible.com http://www.saltstack.com https://www.openstack.org https://xon.sh /dev/peps/ /dev/peps/peps.rss /psf/ /psf/ /users/membership/ /psf/donations/ #python-network /about/ /about/apps/ /about/quotes/ /about/gettingstarted/ /about/help/ http://brochure.getpython.info/ /downloads/ /downloads/ /downloads/source/ /downloads/windows/ /downloads/macos/ /download/other/ https://docs.python.org/3/license.html /download/alternatives /doc/ /doc/ /doc/av https://wiki.python.org/moin/BeginnersGuide https://devguide.python.org/ https://docs.python.org/faq/ http://wiki.python.org/moin/Languages http://python.org/dev/peps/ https://wiki.python.org/moin/PythonBooks /doc/essays/ /community/ /community/survey /community/diversity/ /community/lists/ /community/irc/ /community/forums/ /psf/annual-report/2021/ /community/workshops/ /community/sigs/ /community/logos/ https://wiki.python.org/moin/ /community/merchandise/ /community/awards /psf/conduct/ /psf/get-involved/ /psf/community-stories/ /success-stories/ /success-stories/category/arts/ /success-stories/category/business/ /success-stories/category/education/ /success-stories/category/engineering/ /success-stories/category/government/ /success-stories/category/scientific/ /success-stories/category/software-development/ /blogs/ /blogs/ /psf/newsletter/ http://planetpython.org/ http://pyfound.blogspot.com/ http://pycon.blogspot.com/ /events/ /events/python-events/ /events/python-user-group/ /events/python-events/past/ /events/python-user-group/past/ https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event /dev/ https://devguide.python.org/ https://bugs.python.org/ https://mail.python.org/mailman/listinfo/python-dev /dev/core-mentorship/ /dev/security/ #python-network /about/help/ /community/diversity/ https://github.com/python/pythondotorg/issues https://status.python.org/ /psf-landing/ /about/legal/ /privacy/ /psf/sponsorship/sponsors/#heroku
You can use select_one and select in a similar fasion to find and findAll, but these functions are more powerful. They allow defining complex conditions by CSS syntax!
for example, the first line below finds the first Check out the documentation for more awesome tricks! What if we want to access the text? The goal of this exercise is to familiarize you more with Inspecting HTML source codes by extracting information from a table. When working with Persian letters, sometimes requests can get the encoding wrong and show strange characters. If this happens, restart the kernel and run the code again</span></b> By inspecting these tag names in the webpage youre trying to crawl, give a short description of what they represent: Explain what this code is doing: First finds all the rows. Those have "tr" tag Explain what this code is doing: For each rows find all cells with "td" tag and prints the text of all cells. < br>
Here the output is the results for each team. As you see, some of the lines have extra spacings, or extra characters like "\n" & "\r". We can use Now let's search for a nice new laptop in digikala. :) if we want to crawl laptops from all pages, we should change the URL accordingly.
In this example, we want to crawl the first 20 pages. Crawling 20 pages might take a while to finish, so begin with just a few pages and increase the number when you're sure about your code. Using Your tasks are as follows: 720 rows × 4 columns You can use any method to do the division. I recommend working with Regex! You can play around with your Regex patterns here 66 rows × 6 columns Now let's check the flights to your favorite destination in [Alibaba](https://www.alibaba.ir/iranout) and crawl their info. But wait a minute, this process sounds a little hard for Beautiful Soup! It includes filling interactive forms, and scrolling to generate new tickets. These stufff are handled by JavaScript, but BeautifulSoup only works with a static HTML source code... We need a more powerful Crawler. We need... Selenium Selenium is an open-source automated testing framework for web applications. Selenium provides a playback tool for authoring functional tests without the need to learn a test scripting language. Working with Selenium requires installing a WebDriver. Selenium WebDriver is a web framework that permits you to execute cross-browser tests. This tool is used for automating web-based application testing to verify that it performs expectedly. Selenium WebDriver allows you to choose a programming language to create test scripts. you should see your driver with an Now, run this code and watch the magic happen. Pay attention to what is happening in your new browser window and speculate the code. Note that when you open a page, the driver does not proceed until the page has fully loaded, so be patient! your code is working fine. :) If you encounter unsolvable issues with browsers other than Chrome, I recommend you to switch to this browser Pretty neat, eh? :D The code below, opens Google, searches 'Python' & opens the first link, then closes the browser.
The goal of this example is to familiarize you with the basics of Selenium. So play around with this code block and even change stuff until you have grasped the concepts. Finding elements in Selenium is similar to Beautiful Soup. In the beginning lines of the code below, The Figure below shows the attributes of the search bar in google. So in the code below, we find it by its name, 'q'. Find and fill in the class name of the Google search button in the desired part of the code below!
(You cannot run the code successfully without doing this first) Run the code after doing this. Don't change anything else You can use XPath when the element does not have any specific attributes, or when its attributes are prone to change in the future. XPath finds the element by its relative location within the tags of an XML or HTML file. The figure below shows how you can get the XPath of the first google search result. You can write your own custom XPath too! Similar to XPath, we can find an element by CSS selector too, which in syntax, is the equivalent of you will probably encounter this error: element not interactable this error happens for 1 of 2 reasons, either there is more than one element with the specified condition, or the element has not fully loaded. The driver by default does not wait for our writes to finish, or the element to fully load! and its trying to find an item that does not exist yet! we should always make small pauses between our steps. There are better ways to tell the driver to wait too! you can explicitly wait until an expected condition occurs too. Now back to checking tickets! I have already wrote some code to get ticket information! I'm looking for one-way flights from Tehran to Copenhagen on 1st of Aban. Run the code and watch the crawling process. Change the destination & the date! Explain in a paragraph, what the code is doing in each section, what information is it crawling, & where does it take each of these information from? : your answer here: برای انتخاب عناصر صفحه کد از سی اس اس استفاده کرده است بنابراین در قسمت اول این مقادیر برای فرم ها و ... مشخص شده است. ایرلاین، فرودگاه و زمان مبدا و مقصد و قیمت را از صفحه اصلی به دست می آورد. When I first wrote this code, the website looked like the first figure below. but recently they have changed the style and It looks like the second figure below. So in the code we first switch back to the old website, and then proceed.
The code below introduces 3 new concepts, This is good! But you know what would be better? Crawling the same information from [Mrbilit](mrbilit.com) & comparing the prices of these websites! Note that in case of flights with stops, Flight_number by itself cannot be used as a key, so I recommend joining the tables by more than one column
Get crawling! but always treat the data & its owners with respect. There are a number of online articles about ethics of crawling. Check them out if you are interested. :)
show = soup.select_one('div > div#nojs')
# show = soup.select('div > div.do-not-print')
show
<div class="do-not-print" id="nojs">
<p><strong>Notice:</strong> While JavaScript is not essential for this website, your interaction with the content will be limited. Please turn JavaScript on for the full experience. </p>
</div>
show.text
'\nNotice: While JavaScript is not essential for this website, your interaction with the content will be limited. Please turn JavaScript on for the full experience. \n'
Task1: Football table¶
import requests
from bs4 import BeautifulSoup
''' Replace the URL to your table here'''
url = 'https://www.varzesh3.com/table/%D8%AC%D8%AF%D9%88%D9%84-%D8%A7%D9%86%DA%AF%D9%84%DB%8C%D8%B3-2022-2021-%D9%84%DB%8C%DA%AF-%D8%A8%D8%B1%D8%AA%D8%B1'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')
table
<table border="0" cellpadding="0" cellspacing="0" class="full-table">
<thead>
<tr>
<th align="center" class="full-table--header" colspan="11">
انگلیس لیگ برتر
</th>
</tr>
<tr>
<th class="t000" rowspan="1"></th>
<th class="t100" rowspan="1">تيم</th>
<th class="t200">بازيها</th>
<th class="t300">برد</th>
<th class="t400">مساوی</th>
<th class="t500">باخت</th>
<th class="t600">گل زده</th>
<th class="t700">گل خورده</th>
<th class="t800">تفاضل گل</th>
<th class="t900">امتياز</th>
<th class="t900"></th>
</tr>
</thead>
<tbody>
<tr class="">
<td class="t001" height="15">1</td>
<td class="t101">
<a href="/team/چلسی" title="چلسی">چلسی</a>
</td>
<td class="t201">9</td>
<td class="t301">7</td>
<td class="t401">1</td>
<td class="t501">1</td>
<td class="t601">23</td>
<td class="t701">3</td>
<td class="t801">20</td>
<td class="t901">
22
</td>
</tr>
<tr class="alternative">
<td class="t001" height="15">2</td>
<td class="t101">
<a href="/team/لیورپول" title="لیورپول">لیورپول</a>
</td>
<td class="t201">9</td>
<td class="t301">6</td>
<td class="t401">3</td>
<td class="t501">0</td>
<td class="t601">27</td>
<td class="t701">6</td>
<td class="t801">21</td>
<td class="t901">
21
</td>
</tr>
<tr class="">
<td class="t001" height="15">3</td>
<td class="t101">
<a href="/team/منچسترسیتی" title="منچسترسیتی">منچسترسیتی</a>
</td>
<td class="t201">9</td>
<td class="t301">6</td>
<td class="t401">2</td>
<td class="t501">1</td>
<td class="t601">20</td>
<td class="t701">4</td>
<td class="t801">16</td>
<td class="t901">
20
</td>
</tr>
<tr class="alternative">
<td class="t001" height="15">4</td>
<td class="t101">
<a href="/team/وستهام" title="وستهام">وستهام</a>
</td>
<td class="t201">9</td>
<td class="t301">5</td>
<td class="t401">2</td>
<td class="t501">2</td>
<td class="t601">16</td>
<td class="t701">10</td>
<td class="t801">6</td>
<td class="t901">
17
</td>
</tr>
<tr class="">
<td class="t001" height="15">5</td>
<td class="t101">
<a href="/team/برایتون" title="برایتون">برایتون</a>
</td>
<td class="t201">9</td>
<td class="t301">4</td>
<td class="t401">3</td>
<td class="t501">2</td>
<td class="t601">9</td>
<td class="t701">9</td>
<td class="t801">0</td>
<td class="t901">
15
</td>
</tr>
<tr class="alternative">
<td class="t001" height="15">6</td>
<td class="t101">
<a href="/team/تاتنهام" title="تاتنهام">تاتنهام</a>
</td>
<td class="t201">9</td>
<td class="t301">5</td>
<td class="t401">0</td>
<td class="t501">4</td>
<td class="t601">9</td>
<td class="t701">13</td>
<td class="t801">-4</td>
<td class="t901">
15
</td>
</tr>
<tr class="">
<td class="t001" height="15">7</td>
<td class="t101">
<a href="/team/منچستریونایتد" title="منچستریونایتد">منچستریونایتد</a>
</td>
<td class="t201">9</td>
<td class="t301">4</td>
<td class="t401">2</td>
<td class="t501">3</td>
<td class="t601">16</td>
<td class="t701">15</td>
<td class="t801">1</td>
<td class="t901">
14
</td>
</tr>
<tr class="alternative">
<td class="t001" height="15">8</td>
<td class="t101">
<a href="/team/اورتون" title="اورتون">اورتون</a>
</td>
<td class="t201">9</td>
<td class="t301">4</td>
<td class="t401">2</td>
<td class="t501">3</td>
<td class="t601">15</td>
<td class="t701">14</td>
<td class="t801">1</td>
<td class="t901">
14
</td>
</tr>
<tr class="">
<td class="t001" height="15">9</td>
<td class="t101">
<a href="/team/لسترسیتی" title="لسترسیتی">لسترسیتی</a>
</td>
<td class="t201">9</td>
<td class="t301">4</td>
<td class="t401">2</td>
<td class="t501">3</td>
<td class="t601">15</td>
<td class="t701">15</td>
<td class="t801">0</td>
<td class="t901">
14
</td>
</tr>
<tr class="alternative">
<td class="t001" height="15">10</td>
<td class="t101">
<a href="/team/آرسنال" title="آرسنال">آرسنال</a>
</td>
<td class="t201">9</td>
<td class="t301">4</td>
<td class="t401">2</td>
<td class="t501">3</td>
<td class="t601">10</td>
<td class="t701">13</td>
<td class="t801">-3</td>
<td class="t901">
14
</td>
</tr>
<tr class="">
<td class="t001" height="15">11</td>
<td class="t101">
<a href="/team/ولورهمپتون" title="ولورهمپتون">ولورهمپتون</a>
</td>
<td class="t201">9</td>
<td class="t301">4</td>
<td class="t401">1</td>
<td class="t501">4</td>
<td class="t601">9</td>
<td class="t701">9</td>
<td class="t801">0</td>
<td class="t901">
13
</td>
</tr>
<tr class="alternative">
<td class="t001" height="15">12</td>
<td class="t101">
برنتفورد </td>
<td class="t201">9</td>
<td class="t301">3</td>
<td class="t401">3</td>
<td class="t501">3</td>
<td class="t601">11</td>
<td class="t701">9</td>
<td class="t801">2</td>
<td class="t901">
12
</td>
</tr>
<tr class="">
<td class="t001" height="15">13</td>
<td class="t101">
<a href="/team/استون-ویلا" title="استون ویلا">استون ویلا</a>
</td>
<td class="t201">9</td>
<td class="t301">3</td>
<td class="t401">1</td>
<td class="t501">5</td>
<td class="t601">13</td>
<td class="t701">15</td>
<td class="t801">-2</td>
<td class="t901">
10
</td>
</tr>
<tr class="alternative">
<td class="t001" height="15">14</td>
<td class="t101">
<a href="/team/واتفورد" title="واتفورد">واتفورد</a>
</td>
<td class="t201">9</td>
<td class="t301">3</td>
<td class="t401">1</td>
<td class="t501">5</td>
<td class="t601">12</td>
<td class="t701">17</td>
<td class="t801">-5</td>
<td class="t901">
10
</td>
</tr>
<tr class="">
<td class="t001" height="15">15</td>
<td class="t101">
<a href="/team/کریستال-پالاس" title="کریستال پالاس">کریستال پالاس</a>
</td>
<td class="t201">9</td>
<td class="t301">1</td>
<td class="t401">6</td>
<td class="t501">2</td>
<td class="t601">11</td>
<td class="t701">14</td>
<td class="t801">-3</td>
<td class="t901">
9
</td>
</tr>
<tr class="alternative">
<td class="t001" height="15">16</td>
<td class="t101">
<a href="/team/ساوتهمپتون" title="ساوتهمپتون">ساوتهمپتون</a>
</td>
<td class="t201">9</td>
<td class="t301">1</td>
<td class="t401">5</td>
<td class="t501">3</td>
<td class="t601">8</td>
<td class="t701">12</td>
<td class="t801">-4</td>
<td class="t901">
8
</td>
</tr>
<tr class="">
<td class="t001" height="15">17</td>
<td class="t101">
لیدز </td>
<td class="t201">9</td>
<td class="t301">1</td>
<td class="t401">4</td>
<td class="t501">4</td>
<td class="t601">8</td>
<td class="t701">16</td>
<td class="t801">-8</td>
<td class="t901">
7
</td>
</tr>
<tr class="alternative">
<td class="t001" height="15">18</td>
<td class="t101">
<a href="/team/برنلی" title="برنلی">برنلی</a>
</td>
<td class="t201">9</td>
<td class="t301">0</td>
<td class="t401">4</td>
<td class="t501">5</td>
<td class="t601">7</td>
<td class="t701">15</td>
<td class="t801">-8</td>
<td class="t901">
4
</td>
</tr>
<tr class="">
<td class="t001" height="15">19</td>
<td class="t101">
<a href="/team/نیوکسل" title="نیوکسل">نیوکسل</a>
</td>
<td class="t201">9</td>
<td class="t301">0</td>
<td class="t401">4</td>
<td class="t501">5</td>
<td class="t601">11</td>
<td class="t701">20</td>
<td class="t801">-9</td>
<td class="t901">
4
</td>
</tr>
<tr class="alternative">
<td class="t001" height="15">20</td>
<td class="t101">
<a href="/team/نوریچ" title="نوریچ">نوریچ</a>
</td>
<td class="t201">9</td>
<td class="t301">0</td>
<td class="t401">2</td>
<td class="t501">7</td>
<td class="t601">2</td>
<td class="t701">23</td>
<td class="t801">-21</td>
<td class="t901">
2
</td>
</tr>
</tbody>
</table>
<thead\>: header of the table including انگلیس لیگ برتر and تیم، بازیها و ...<tr\>: table rows for defining teams informations. <th\>: for the header cell for the first row of the table.<tbody\>: the body of the table for defining rows in it <td\>: for defining standard cell in each rows including تیم، بازیها، برد،باخت و ...
rows = table.find_all('tr')
for row in rows:
for head in row.find_all('th'):
print([head.text])
['\r\n انگلیس لیگ برتر\r\n ']
['']
['تيم']
['بازيها']
['برد']
['مساوی']
['باخت']
['گل زده']
['گل خورده']
['تفاضل گل']
['امتياز']
['']
for row in rows:
for body in row.find_all('td'):
print([body.text])
['1']
['\nچلسی\n']
['9']
['7']
['1']
['1']
['23']
['3']
['20']
['\r\n 22\r\n ']
['2']
['\nلیورپول\n']
['9']
['6']
['3']
['0']
['27']
['6']
['21']
['\r\n 21\r\n ']
['3']
['\nمنچسترسیتی\n']
['9']
['6']
['2']
['1']
['20']
['4']
['16']
['\r\n 20\r\n ']
['4']
['\nوستهام\n']
['9']
['5']
['2']
['2']
['16']
['10']
['6']
['\r\n 17\r\n ']
['5']
['\nبرایتون\n']
['9']
['4']
['3']
['2']
['9']
['9']
['0']
['\r\n 15\r\n ']
['6']
['\nتاتنهام\n']
['9']
['5']
['0']
['4']
['9']
['13']
['-4']
['\r\n 15\r\n ']
['7']
['\nمنچستریونایتد\n']
['9']
['4']
['2']
['3']
['16']
['15']
['1']
['\r\n 14\r\n ']
['8']
['\nاورتون\n']
['9']
['4']
['2']
['3']
['15']
['14']
['1']
['\r\n 14\r\n ']
['9']
['\nلسترسیتی\n']
['9']
['4']
['2']
['3']
['15']
['15']
['0']
['\r\n 14\r\n ']
['10']
['\nآرسنال\n']
['9']
['4']
['2']
['3']
['10']
['13']
['-3']
['\r\n 14\r\n ']
['11']
['\nولورهمپتون\n']
['9']
['4']
['1']
['4']
['9']
['9']
['0']
['\r\n 13\r\n ']
['12']
['\r\nبرنت\u200cفورد ']
['9']
['3']
['3']
['3']
['11']
['9']
['2']
['\r\n 12\r\n ']
['13']
['\nاستون ویلا\n']
['9']
['3']
['1']
['5']
['13']
['15']
['-2']
['\r\n 10\r\n ']
['14']
['\nواتفورد\n']
['9']
['3']
['1']
['5']
['12']
['17']
['-5']
['\r\n 10\r\n ']
['15']
['\nکریستال پالاس\n']
['9']
['1']
['6']
['2']
['11']
['14']
['-3']
['\r\n 9\r\n ']
['16']
['\nساوتهمپتون\n']
['9']
['1']
['5']
['3']
['8']
['12']
['-4']
['\r\n 8\r\n ']
['17']
['\r\nلیدز ']
['9']
['1']
['4']
['4']
['8']
['16']
['-8']
['\r\n 7\r\n ']
['18']
['\nبرنلی\n']
['9']
['0']
['4']
['5']
['7']
['15']
['-8']
['\r\n 4\r\n ']
['19']
['\nنیوکسل\n']
['9']
['0']
['4']
['5']
['11']
['20']
['-9']
['\r\n 4\r\n ']
['20']
['\nنوریچ\n']
['9']
['0']
['2']
['7']
['2']
['23']
['-21']
['\r\n 2\r\n ']
replace('a', 'b') and strip() functions on any string to deal with these.
rows = table.find_all('tr')
datas = []
for row in rows:
data = []
for head in row.find_all('th')[:10]:
h = head.text
h = h.strip()
h = h.replace("\n", '')
h = h.replace("t", '')
data.append(h)
for body in row.find_all('td')[:10]:
b = body.text
b = b.strip()
b = b.replace("\n", '')
b = b.replace("t", '')
data.append(b)
datas.append(data)
datas
[['انگلیس لیگ برتر'],
['',
'تيم',
'بازيها',
'برد',
'مساوی',
'باخت',
'گل زده',
'گل خورده',
'تفاضل گل',
'امتياز'],
['1', 'چلسی', '9', '7', '1', '1', '23', '3', '20', '22'],
['2', 'لیورپول', '9', '6', '3', '0', '27', '6', '21', '21'],
['3', 'منچسترسیتی', '9', '6', '2', '1', '20', '4', '16', '20'],
['4', 'وستهام', '9', '5', '2', '2', '16', '10', '6', '17'],
['5', 'برایتون', '9', '4', '3', '2', '9', '9', '0', '15'],
['6', 'تاتنهام', '9', '5', '0', '4', '9', '13', '-4', '15'],
['7', 'منچستریونایتد', '9', '4', '2', '3', '16', '15', '1', '14'],
['8', 'اورتون', '9', '4', '2', '3', '15', '14', '1', '14'],
['9', 'لسترسیتی', '9', '4', '2', '3', '15', '15', '0', '14'],
['10', 'آرسنال', '9', '4', '2', '3', '10', '13', '-3', '14'],
['11', 'ولورهمپتون', '9', '4', '1', '4', '9', '9', '0', '13'],
['12', 'برنت\u200cفورد', '9', '3', '3', '3', '11', '9', '2', '12'],
['13', 'استون ویلا', '9', '3', '1', '5', '13', '15', '-2', '10'],
['14', 'واتفورد', '9', '3', '1', '5', '12', '17', '-5', '10'],
['15', 'کریستال پالاس', '9', '1', '6', '2', '11', '14', '-3', '9'],
['16', 'ساوتهمپتون', '9', '1', '5', '3', '8', '12', '-4', '8'],
['17', 'لیدز', '9', '1', '4', '4', '8', '16', '-8', '7'],
['18', 'برنلی', '9', '0', '4', '5', '7', '15', '-8', '4'],
['19', 'نیوکسل', '9', '0', '4', '5', '11', '20', '-9', '4'],
['20', 'نوریچ', '9', '0', '2', '7', '2', '23', '-21', '2']]
datas to a pandas DataFrame with proper column names & no empty rows or columns and all numbers as integers. import pandas as pd
df = pd.DataFrame(datas[2:], columns=datas[1]).rename(columns={'':'رتبه'})
int_columns = [df.columns[0]] + list(df.columns[2:])
df[int_columns] = df[int_columns].astype(int)
df.head()
رتبه
تيم
بازيها
برد
مساوی
باخت
گل زده
گل خورده
تفاضل گل
امتياز
0
1
چلسی
9
7
1
1
23
3
20
22
1
2
لیورپول
9
6
3
0
27
6
21
21
2
3
منچسترسیتی
9
6
2
1
20
4
16
20
3
4
وستهام
9
5
2
2
16
10
6
17
4
5
برایتون
9
4
3
2
9
9
0
15
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 رتبه 20 non-null int64
1 تيم 20 non-null object
2 بازيها 20 non-null int64
3 برد 20 non-null int64
4 مساوی 20 non-null int64
5 باخت 20 non-null int64
6 گل زده 20 non-null int64
7 گل خورده 20 non-null int64
8 تفاضل گل 20 non-null int64
9 امتياز 20 non-null int64
dtypes: int64(9), object(1)
memory usage: 1.7+ KB
Task2: Digikala laptop search¶
Below you can see the URL of the first page in the laptop section, sorted by the most viewed.
https://www.digikala.com/search/category-notebook-netbook-ultrabook/?pageno=1&sortby=4tqdm library helps by showing a progress bar! install it with pip or conda before running the code below.
For loop belowimport requests
from bs4 import BeautifulSoup
import pandas
from tqdm import tqdm
url = 'https://www.digikala.com/search/category-notebook-netbook-ultrabook/?pageno='
titles = []
engagements = []
stars = []
prices = []
for page in tqdm(range(1, 21)):
'''' Enter your code here to Change the URL page '''
new_url = f'https://www.digikala.com/search/category-notebook-netbook-ultrabook/?pageno={page}&sortby=4'
page = requests.get(new_url)
soup = BeautifulSoup(page.text, 'html.parser')
# Get name of product
title = soup.select('div.c-product-box__content--row')
for t in title:
name = t.text
titles.append(name)
# Get price of product
priceBoxes = soup.findAll('div', {'class': 'c-price__value c-price__value--plp js-plp-product-card-price'})
for each_price in priceBoxes:
# a PriceBox is containing initial price, discount %, and final price, we want only the final price
price = each_price.find('div', {'class': 'c-price__value-wrapper'})
price = price.text
prices.append(price)
''' Enter your code here to get engagement number'''
engagements_span = soup.findAll('span', {'class': 'c-product-box__engagement-rating-num'})
for engagement in engagements_span:
s = engagement.text
engagements.append(s[s.find("(")+1:s.find(")")])
''' Enter your code here to get star number'''
stars_div = soup.select('div.c-product-box__engagement-rating')
for star in stars_div:
stars.append(star.text.split()[0])
# Saving info in a dictionary
product = {'Title': titles, 'Engagements': engagements, 'Stars': stars, 'Prices': prices}
# Saving dictionary in a dataframe
data = pandas.DataFrame.from_dict(product, orient='index')
data = data.transpose()
data
100%|██████████| 20/20 [00:20<00:00, 1.01s/it]
Title
Engagements
Stars
Prices
0
لپ تاپ 15 اینچی ایسوس مدل VivoBook R521JA-BQ08...
۶۶
۴
\n ۱۴,۸۹۹,۰...
1
لپ تاپ 15.6 اینچی ایسوس مدل X543MA-GQ1013ASUS ...
۵۰
۴
\n ۸,۸۸۰,۰۰...
2
لپ تاپ 15.6 اینچی ایسوس مدل VivoBook S533EQ - ...
۱۱۱
۴.۱
\n ۳۴,۷۵۰,۰...
3
لپ تاپ 15 اینچی لنوو مدل Ideapad 330 - ELenovo...
۱۰۲۳
۳.۸
\n ۹,۴۵۰,۰۰...
4
لپ تاپ 15 اینچی لنوو مدل Ideapad 330 - NXBLeno...
۶۸۷
۳.۸
\n ۱۰,۴۰۰,۰...
...
...
...
...
...
715
لپ تاپ دل اینسپایرون 5110Dell Inspiron 5110-N
None
None
None
716
لپ تاپ 17 اینچی ایسوس مدل TUF GAMING FX706IIAS...
None
None
None
717
لپ تاپ 15.6 اینچی اچ پی مدل Pavilion Gaming 15...
None
None
None
718
لپ تاپ 15.6 اینچی ام اس آی مدل MODERN 15-D A10...
None
None
None
719
لپ تاپ 15.6 اینچی لنوو مدل Ideapad L340-R7Leno...
None
None
None
'''Enter your code here'''
import re
df = data.copy()
# cleaning
df = df.fillna(0)
# 3 columns as Integers
df['Stars'] = df['Stars'].astype(float)
df['Engagements'] = df['Engagements'].astype(int)
def find_prices(s):
try:
return ''.join(re.findall('[۰-۹]+', s))
except:
return 0
df['Prices'] = df['Prices'].apply(find_prices).astype(int)
def find_size(s):
'''
find float or int number near "inch" word
'''
try:
return re.findall('\d+\.\d+|[0-9]+ inch', s)[0].split()[0]
except:
return 0
df['Size'] = df['Title'].apply(find_size).astype(float)
def find_brand(s):
'''
find brand between two words: اینچ و مدل
'''
try:
return re.search('اینچی(.*?)مدل', s).group(1).strip().replace(' ', '')
except:
try:
return re.search('اینچ(.*?)مدل', s).group(1).strip().replace(' ', '')
except:
return ''
df['Brand'] = df['Title'].apply(find_brand)
def find_model(s):
'''
find mode between two words مدل and inch size
'''
try:
return re.search('مدل(.*?)(\d+\.\d+|[0-9]+) inch', s).group(1).strip().strip('-').strip()
except:
return ''
df['Model'] = df['Title'].apply(find_model)
del df['Title']
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 720 entries, 0 to 719
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Engagements 720 non-null int64
1 Stars 720 non-null float64
2 Prices 720 non-null int64
3 Size 720 non-null float64
4 Brand 720 non-null object
5 Model 720 non-null object
dtypes: float64(2), int64(2), object(2)
memory usage: 33.9+ KB
df.head()
Engagements
Stars
Prices
Size
Brand
Model
0
66
4.0
14899000
15.6
ایسوس
VivoBook R521JA-BQ083ASUS VivoBook R521JA-BQ083
1
50
4.0
8880000
15.6
ایسوس
X543MA-GQ1013ASUS X543MA-GQ1013
2
111
4.1
34750000
15.6
ایسوس
VivoBook S533EQ - AASUS VivoBook S533EQ - A
3
1023
3.8
9450000
15.0
لنوو
Ideapad 330 - ELenovo Ideapad 330 - E
4
687
3.8
10400000
15.0
لنوو
Ideapad 330 - NXBLenovo Ideapad 330 - NXB
'''Enter your code here'''
df[(df['Brand']=='ایسوس') & (df['Size']==15.6)]
Engagements
Stars
Prices
Size
Brand
Model
0
66
4.0
14899000
15.6
ایسوس
VivoBook R521JA-BQ083ASUS VivoBook R521JA-BQ083
1
50
4.0
8880000
15.6
ایسوس
X543MA-GQ1013ASUS X543MA-GQ1013
2
111
4.1
34750000
15.6
ایسوس
VivoBook S533EQ - AASUS VivoBook S533EQ - A
10
44
3.9
8870000
15.6
ایسوس
X543MA-GQ1304ASUS X543MA-GQ1304
30
4
3.8
23980000
15.6
ایسوس
R565 EP- BQ322ASUS R565 EP- BQ322
...
...
...
...
...
...
...
658
0
0.0
0
15.6
ایسوس
VivoBook R545FJ - AASUS VivoBook R545FJ - A
667
0
0.0
0
15.6
ایسوس
VivoBook R545-FJ-BQ093ASUS VivoBook R545-FJ-BQ093
670
0
0.0
0
15.6
ایسوس
TUF GAMING FX506LI-HN147ASUS TUF GAMING FX506L...
673
0
0.0
0
15.6
ایسوس
VivoBook S533EQ - BASUS VivoBook S533EQ - B
705
0
0.0
0
15.6
ایسوس
VivoBook R565JF-BQ79ASUS VivoBook R565JF-BQ79
'''Enter your code here'''
df.groupby('Brand')['Engagements'].sum().sort_values(ascending=False)
Brand
لنوو 6113
ایسوس 1232
دل 893
اچپی 771
اچپی 400
اپل 304
ایسر 252
183
هوآوی 181
مایکروسافت 140
اماسآی 92
پورشدیزاین 79
اپلمکبوک 61
ریزر 14
شیائومی 0
Name: Engagements, dtype: int64
'''Enter your code here'''
df[(df['Prices'] != 0.0) & (df['Size'] != 0.0)].groupby('Size')['Prices'].mean()
Size
5.0 2.054083e+07
11.0 8.895000e+06
11.6 1.142500e+07
12.4 2.520000e+07
13.0 4.108282e+07
13.3 4.497850e+07
13.4 6.513333e+07
14.0 2.695020e+07
15.0 2.848200e+07
15.6 2.474417e+07
17.3 9.208997e+07
Name: Prices, dtype: float64
Working with Selenium¶
!conda install -y -c conda-forge selenium
ls command: I use chrome so my driver is chromedriver.exels
Selenium webdriver basics¶
from selenium import webdriver
import time
''' Change this according to your browser'''
# driver = webdriver.Chrome()
driver = webdriver.Firefox(executable_path='/home/aliiz/Desktop/term3/data-analysis/HWs/HW2/geckodriver')
#driver = webdriver.Edge()
#driver = webdriver.Safari()
# Open this page
driver.get('https://www.alibaba.ir/iranout')
# Go fullscreen
driver.maximize_window()
# After the page has fully loaded, wait 2 seconds
time.sleep(2)
# Open this page
driver.get('https://mrbilit.com/international-flights')
# Go back
driver.back()
time.sleep(2)
# Go forward
driver.forward()
# Close the browser
driver.close()
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:6: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
Another Example¶
driver.find_element_by_name</b> is using the name attribute for finding an element. Press TAB after the driver.find_element... part to see other possibilities. We must choose one of these functions after inspecting our desired element in the source code and according to its attributes.
select and select_one in Beautiful Soup.
Just simply click on copy selector instead of copy XPath. Similarly, you can write your own CSS selectors too.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Firefox(executable_path='/home/aliiz/Desktop/term3/data-analysis/HWs/HW2/geckodriver')
driver.get('https://www.google.com/')
driver.maximize_window()
# Find the google search bar by its name
input_1 = driver.find_element_by_name('q')
# Writes 'python' in the search bar
input_1.send_keys('python')
time.sleep(1)
'''
Fill the code below to find the google search button by its class name. You cannot run this code without filling this!
'''
btn = driver.find_element_by_class_name('gNO89b')
# Clicks on the google search button
btn.click()
'''
Instead of clicking that button, we could have pressed ENTER too!
Comment out the lines of finding and clicking the search button above and run the line below instead.
See other possible Key presses too by pressing TAB in front of -Keys-
'''
#input_1.send_keys('python' + Keys.ENTER)
time.sleep(1)
'''
Uncomment only one of the lines below containing -first_link- in each of your runs
Fill this by copying the -XPath- from your browser
'''
# first_link = driver.find_element_by_xpath('//*[@id="rso"]/div[1]/div/div/div/div/div/div/div[1]/a/h3').click()
# You can write your custom XPath
# first_link = driver.find_element_by_xpath('//a[@href="https://www.python.org/"]').click()
'''
Fill this by copying the -selector- from your browser
'''
first_link = driver.find_element_by_css_selector('.eKjLze > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > a:nth-child(1) > h3:nth-child(2)').click()
# You can write your custom CSS selector
#first_link = driver.find_element_by_css_selector('a[href="https://www.python.org/"]').click()
time.sleep(3)
driver.close()
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
"""
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:10: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
# Remove the CWD from sys.path while we load stuff.
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:20: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:48: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
time.sleep(1) from the code above and run it again.
Waits¶
driver.implicitly_wait(10) waits until everything is ready and maximum for 10 seconds.WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, 'gNO89b'))) waits either for 10 seconds or for the condition of "element_to_be_clickable" to be True. It introduces a new concept of finding elements too using By! Run the code once with implicit wait, & once with explicit wait.from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Firefox(executable_path='/home/aliiz/Desktop/term3/data-analysis/HWs/HW2/geckodriver')
driver.get('https://www.google.com/')
driver.maximize_window()
input_1 = driver.find_element_by_name('q')
input_1.send_keys('python')
# Explicit wait ####################################
'''
press tab after -EC- and -By- to see the other possiblities too
'''
btn = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, 'gNO89b')))
btn.click()
# Implicit wait ####################################
# driver.implicitly_wait(10)
# btn = driver.find_element_by_class_name('gNO89b')
# btn.click()
time.sleep(3)
driver.close()
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:8: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:12: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
if sys.path[0] == '':
ابتدا با کلیک بر روی لینک به ورژن قدیمی سایت میرود که باید یک گزینه و میزان رضایت را انتخاب کند.
سپس اطلاعات بلیط شامل پرواز خارجی، یک طرفه بودن،مبدا و مقصد و تاریخ را مشخص کرده و جست وجو را کلیک میکند.
بعد از این که صفحه لود شد تا جایی که امکان پذیر است اسکرول میکند تا اطلاعات پروازها همه مشاهده شود.
سپس همه ی پروازها رو با استفاده از کلاس آن به دست می آورد و اطلاعات زیر را در دو قسمت به دست می آورد:
سپس با کلیک بر روی هر پرواز مدت زمان سفر و شماره اولین پرواز را به دست می آورد و صفخه پاپ آپ و کل صفحه را میبندد. برای اجرای مجدد کد مقصد به ژاپن شهر یاماگاتا و تاریخ نیز به ۸ آبان تغییر داده شد.¶
find_elements which is the equivalent of find_all in Beautiful Soup, scrolling which is done by executing JavaScript codes, & using the mouse, which is done by ActionChains.from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
driver = webdriver.Firefox(executable_path='/home/aliiz/Desktop/term3/data-analysis/HWs/HW2/geckodriver')
driver.get('https://www.alibaba.ir/iranout')
driver.maximize_window()
origin_city = 'tehran'
destination_city = 'japan'
old_website_css = '#header > div.bg-grays-150.text-grays-700.text-3.py-3 > div > button.btn.is-raw.is-rounded.bg-grays-200.px-3.py-1.mr-4'
old_website_star_css = '#modal_container > div > div > div.flex.w-full.items-center.feedback__star-wrapper > div > button:nth-child(5)'
old_website_reason_css = '#modal_container > div > div > div:nth-child(7) > button:nth-child(6)'
old_website_submit_css ='#modal_container > div > div > button.btn.is-secondary.sm\:mx-auto.mt-auto.py-3.px-4'
foreign_flights_css = '#search-panels > div.panels.relative > div > ul > li:nth-child(2) > a'
one_way_css = '#search-panels > div.panels.relative > div > div > form > div.gap > span:nth-child(1) > label'
origin_css = '#search-panels > div.panels.relative > div > div > form > div.col-xs-12.col-md-5 > div > div:nth-child(1) > input'
origin_dropdown_css = '#search-panels > div.panels.relative > div > div > form > div.col-xs-12.col-md-5 > div > div:nth-child(1) > div > ul > li.hover > a'
destination_css = '#search-panels > div.panels.relative > div > div > form > div.col-xs-12.col-md-5 > div > div:nth-child(3) > input'
destination_dropdown_css = '#search-panels > div.panels.relative > div > div > form > div.col-xs-12.col-md-5 > div > div:nth-child(3) > div > ul > li.hover > a'
# 8th of Aban (nth-child = 2 & div.calendar__day.first)
date_css = '#search-panels > div.panels.relative > div > div > form > div.col-xs-12.col-md-3.col-lg-3.search-date > div > div.alibaba-datepicker__wrapper.v-dropdown.open.fade > div > div.alibaba-datepicker__container.slide-left > div:nth-child(1) > div > div > div.calendar__container > div:nth-child(8)'
date_btn_css = '#search-panels > div.panels.relative > div > div > form > div.col-xs-12.col-md-3.col-lg-3.search-date > div > div.alibaba-datepicker__wrapper.v-dropdown.open.fade > div > footer > div:nth-child(2) > button'
search_btn_css = '#search-panels > div.panels.relative > div > div > form > div:nth-child(5) > button'
loading_screen_css = '#alibaba_sidebar > div.px-2.col-md-8.col-lg-9 > div > div:nth-child(1) > div.available.row.international-available > div.loading-banner'
airline_class = 'airlines-columns'
departure_class = 'col-sm-3'
arrival_class = 'col-sm-4'
price_css = 'div.col-xs-6.footer-pricing.col-md-12.col-sm-12 > span.pricing-value > span.w-bold'
flight_duration_css = '#alibaba_sidebar > div.px-2.col-md-8.col-lg-9 > div > div:nth-child(2) > div.available.row.international-available.isCompleted > div.modal.details-modal__parent.show.fade.in > div > div > div > div > div.flex.flex-column.flex-1.details-modal__bg.col-xs-12.col-sm-9 > div.flex.flex-1.details-modal__details > div > div > div > div > div > div > div > div > div.trace-route-details > div.trace-route-details__body > div.row.flex.trace-route-details__route > div.col-xs-9 > div > div:nth-child(2) > span'
flight_number_css = '#alibaba_sidebar > div.px-2.col-md-8.col-lg-9 > div > div:nth-child(2) > div.available.row.international-available.isCompleted > div.modal.details-modal__parent.show.fade.in > div > div > div > div > div.flex.flex-column.flex-1.details-modal__bg.col-xs-12.col-sm-9 > div.flex.flex-1.details-modal__details > div > div > div > div > div > div > div > div > div.trace-route-details > div.trace-route-details__body > div:nth-child(2) > div.trace-route-details__timeline--content > div:nth-child(2) > div > span:nth-child(1) > span > span'
waits = WebDriverWait(driver, 10)
search_waits = WebDriverWait(driver, 100)
# Go to the old website
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, old_website_css))).click()
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, old_website_star_css))).click()
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, old_website_reason_css))).click()
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, old_website_submit_css))).click()
# Fill the form
time.sleep(2)
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, foreign_flights_css))).click()
time.sleep(2)
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, one_way_css))).click()
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, origin_css))).send_keys(origin_city)
time.sleep(2)
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, origin_dropdown_css))).click()
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, destination_css))).send_keys(destination_city)
time.sleep(2)
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, destination_dropdown_css))).click()
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, date_css))).click()
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, date_btn_css))).click()
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, search_btn_css))).click()
# Wait until the search has finished
search_waits.until(EC.visibility_of_element_located((By.CSS_SELECTOR, loading_screen_css)))
search_waits.until(EC.invisibility_of_element((By.CSS_SELECTOR, loading_screen_css)))
# Scroll down to fetch all data
speed = 30
current_scroll_position, new_height = 0, 1
while current_scroll_position < new_height:
time.sleep(0.1)
current_scroll_position += speed
# JavaScript code to scroll to a position
driver.execute_script("window.scrollTo(0, {});".format(current_scroll_position))
# Javascript code to return page height
new_height = driver.execute_script("return document.body.scrollHeight")
# Loop over flights
flights = driver.find_elements_by_css_selector('div.international-available__columns')
airlines = []
departures = []
arrivals = []
prices = []
durations = []
flight_numbers = []
for flight in flights:
# Initial info
time.sleep(0.5)
airlines.append(flight.find_element_by_class_name(airline_class).text)
departures.append(flight.find_element_by_class_name(departure_class).text)
arrivals.append(flight.find_element_by_class_name(arrival_class).text)
prices.append(flight.find_element_by_css_selector(price_css).text)
time.sleep(1)
# Click to see more details
flight.click()
time.sleep(1)
durations.append(waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, flight_duration_css))).text)
flight_numbers.append(waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, flight_number_css))).text)
# Click away from the popup window
webdriver.ActionChains(driver).move_by_offset(0, 0).double_click().perform()
driver.close()
# Write data to CSV
product = {'Airline': airlines,
'Departure': departures,
'Arrival': arrivals,
'Price': prices,
'Duration': durations,
'Flight_num': flight_numbers}
data = pd.DataFrame.from_dict(product, orient='index')
data = data.transpose()
data.to_csv('alibaba_flights.csv', index=False)
data.head()
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:10: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
# Remove the CWD from sys.path while we load stuff.
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:87: DeprecationWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
/home/aliiz/dsenv/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py:445: UserWarning: find_element_by_* commands are deprecated. Please use find_element() instead
warnings.warn("find_element_by_* commands are deprecated. Please use find_element() instead")
/home/aliiz/dsenv/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py:483: UserWarning: find_element_by_* commands are deprecated. Please use find_element() instead
warnings.warn("find_element_by_* commands are deprecated. Please use find_element() instead")
Airline
Departure
Arrival
Price
Duration
Flight_num
0
چند ایرلاین
03:10\nتهران ( IKA )
08:15\nYamagata ( GAJ )
207,345,475
47 ساعت و 35 دقیقه
7689
1
چند ایرلاین
07:55\nتهران ( IKA )
08:15\nYamagata ( GAJ )
287,290,395
42 ساعت و 50 دقیقه
873
2
چند ایرلاین
07:35\nتهران ( IKA )
08:15\nYamagata ( GAJ )
287,290,395
43 ساعت و 10 دقیقه
879
3
چند ایرلاین
03:25\nتهران ( IKA )
08:15\nYamagata ( GAJ )
316,309,980
47 ساعت و 20 دقیقه
881
4
چند ایرلاین
22:35\nتهران ( IKA )
08:15\nYamagata ( GAJ )
343,128,430
28 ساعت و 10 دقیقه
499
df = pd.read_csv('alibaba_flights.csv')
df
Airline
Departure
Arrival
Price
Duration
Flight_num
0
چند ایرلاین
03:10\nتهران ( IKA )
08:15\nYamagata ( GAJ )
207,345,475
47 ساعت و 35 دقیقه
7689
1
چند ایرلاین
07:55\nتهران ( IKA )
08:15\nYamagata ( GAJ )
287,290,395
42 ساعت و 50 دقیقه
873
2
چند ایرلاین
07:35\nتهران ( IKA )
08:15\nYamagata ( GAJ )
287,290,395
43 ساعت و 10 دقیقه
879
3
چند ایرلاین
03:25\nتهران ( IKA )
08:15\nYamagata ( GAJ )
316,309,980
47 ساعت و 20 دقیقه
881
4
چند ایرلاین
22:35\nتهران ( IKA )
08:15\nYamagata ( GAJ )
343,128,430
28 ساعت و 10 دقیقه
499
5
چند ایرلاین
15:10\nتهران ( IKA )
08:15\nYamagata ( GAJ )
343,128,430
35 ساعت و 35 دقیقه
483
6
چند ایرلاین
04:40\nتهران ( IKA )
08:15\nYamagata ( GAJ )
343,971,430
46 ساعت و 5 دقیقه
491
7
چند ایرلاین
01:45\nتهران ( IKA )
18:40\nYamagata ( GAJ )
1,767,985,690
35 ساعت و 25 دقیقه
601
Crawling Mrbilit¶
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
import re
driver = webdriver.Firefox(executable_path='/home/aliiz/Desktop/term3/data-analysis/HWs/HW2/geckodriver')
driver.get('https://mrbilit.com/')
driver.maximize_window()
origin_city = 'tehran'
destination_city = 'japan'
foreign_flights_css = '#form-plane > div.mode-container.transform-in-mobile-size > div > div:nth-child(2) > div.mode-radio-circle-container'
origin_xpath = '/html/body/div/div/div/div[2]/div[1]/div[2]/div/div/form[1]/div[4]/div[1]/div[1]/div/div/div/div[1]/div[2]/div[1]/input'
origin_dropdown_css = '#form-plane > div.search-form-rows > div.row.org-dest > div.city-container.city-container-origin > div > div > div > div.v-select.form-control.from.isSelectpicker.select-mobile-fullscreen.vs--open.vs--single.vs--searching.vs--searchable > ul > li:nth-child(4)'
destination_xpath = '/html/body/div/div/div/div[2]/div[1]/div[2]/div/div/form[1]/div[4]/div[1]/div[2]/div/div/div/div[1]/div[2]/div[1]/input'
destination_dropdown_css = '#form-plane > div.search-form-rows > div.row.org-dest > div.city-container.city-container-destination > div > div > div > div.v-select.form-control.from.isSelectpicker.select-mobile-fullscreen.vs--open.vs--single.vs--searching.vs--searchable > ul > li:nth-child(6)'
# 8th of Aban (nth-child = 2 & div.calendar__day.first)
date_css = '#form-plane > div.search-form-rows > div.row.date-row > div > div.datepicker-wrapper > div.datepicker-container > div.datepicker-body > div > div:nth-child(2) > div.month-days-container > div:nth-child(8) > div'
date_btn_css = '#form-plane > div.search-form-rows > div.row.date-row > div > div.datepicker-wrapper > div.datepicker-container > div.datepicker-actions > button.mr-button.datepicker-btn.lg.filled'
search_btn_css = '#checkbox-row > button'
loaded_screen_css = '#__nuxt__ > div.view-panel-container > div.view-body > div.cards-container'
waits = WebDriverWait(driver, 10)
search_waits = WebDriverWait(driver, 100)
# Fill the form
time.sleep(2)
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, foreign_flights_css))).click()
time.sleep(2)
driver.find_element_by_xpath(origin_xpath).send_keys(origin_city)
time.sleep(2)
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, origin_dropdown_css))).click()
driver.find_element_by_xpath(destination_xpath).send_keys(destination_city)
time.sleep(2)
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, destination_dropdown_css))).click()
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, date_css))).click()
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, date_btn_css))).click()
waits.until(EC.element_to_be_clickable((By.CSS_SELECTOR, search_btn_css))).click()
search_waits.until_not(EC.visibility_of_element_located((By.CSS_SELECTOR, loading_screen_css)))
time.sleep(2)
# Scroll down to fetch all data
speed = 30
current_scroll_position, new_height = 0, 1
while current_scroll_position < new_height:
time.sleep(0.1)
current_scroll_position += speed
# JavaScript code to scroll to a position
driver.execute_script("window.scrollTo(0, {});".format(current_scroll_position))
# Javascript code to return page height
new_height = driver.execute_script("return document.body.scrollHeight")
# Loop over flights
flights = driver.find_elements_by_class_name('card-wrapper')
airlines = []
departures_airport = []
departures_time = []
arrivals_airport = []
arrivals_time = []
prices = []
durations = []
flight_numbers = []
airline_class = 'logo-text'
departure_airport_class = 'trip-route-origin'
departure_time_class = 'departure-time'
arrival_airport_class = 'trip-route-destination'
arrival_time_class = 'arrival-time'
price_class = 'buy-payable-price'
flight_duration_class = 'trip-route-arrow-stops'
for flight in flights:
# Initial info
time.sleep(0.5)
airlines.append(flight.find_element_by_class_name(airline_class).text)
departures_airport.append(flight.find_element_by_class_name(departure_airport_class).text)
departures_time.append(flight.find_element_by_class_name(departure_time_class).text)
arrivals_airport.append(flight.find_element_by_class_name(arrival_airport_class).text)
arrivals_time.append(flight.find_element_by_class_name(arrival_time_class).text)
prices.append(flight.find_element_by_class_name(price_class).text)
durations.append(flight.find_element_by_class_name(flight_duration_class).text)
# Click to see more details
flight.find_element_by_class_name('more-details').click()
detail = flight.find_element_by_class_name('card-info')
flight_numbers.append(re.findall('[0-9]+', detail.find_element_by_class_name('logo-text').text)[0])
time.sleep(2)
driver.close()
# Write data to CSV
product = {'Airline': airlines,
'Departure_airport': departures_airport,
'Departure_time': departures_time,
'Arrival_airport': arrivals_airport,
'Arrival_time': arrivals_time,
'Price': prices,
'Duration': durations,
'Flight_number': flight_numbers}
data = pd.DataFrame.from_dict(product, orient='index')
data = data.transpose().drop_duplicates()
data.to_csv('mrblit_flights.csv', index=False)
data.head()
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:11: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
# This is added back by InteractiveShellApp.init_path()
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:40: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:44: DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element() instead
/home/aliiz/dsenv/lib/python3.7/site-packages/ipykernel_launcher.py:70: DeprecationWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
/home/aliiz/dsenv/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py:445: UserWarning: find_element_by_* commands are deprecated. Please use find_element() instead
warnings.warn("find_element_by_* commands are deprecated. Please use find_element() instead")
Airline
Departure_airport
Departure_time
Arrival_airport
Arrival_time
Price
Duration
Flight_number
0
ترکیش ایرلاینز
تهران (IKA)
03:10
استان یاماگاتا (GAJ)
08:15
211,543,000
53 ساعت و 5 دقیقه
7689
1
ترکیش ایرلاینز
تهران (IKA)
03:10
استان یاماگاتا (GAJ)
18:40
211,543,000
63 ساعت و 30 دقیقه
7689
2
ترکیش ایرلاینز
تهران (IKA)
07:55
استان یاماگاتا (GAJ)
08:15
292,377,000
48 ساعت و 20 دقیقه
873
3
ترکیش ایرلاینز
تهران (IKA)
07:55
استان یاماگاتا (GAJ)
18:40
292,377,000
58 ساعت و 45 دقیقه
873
4
ترکیش ایرلاینز
تهران (IKA)
07:35
استان یاماگاتا (GAJ)
08:15
292,377,000
48 ساعت و 40 دقیقه
879
data = pd.read_csv('mrblit_flights.csv')
data
Airline
Departure_airport
Departure_time
Arrival_airport
Arrival_time
Price
Duration
Flight_number
0
ترکیش ایرلاینز
تهران (IKA)
03:10
استان یاماگاتا (GAJ)
08:15
211,543,000
53 ساعت و 5 دقیقه
7689
1
ترکیش ایرلاینز
تهران (IKA)
03:10
استان یاماگاتا (GAJ)
18:40
211,543,000
63 ساعت و 30 دقیقه
7689
2
ترکیش ایرلاینز
تهران (IKA)
07:55
استان یاماگاتا (GAJ)
08:15
292,377,000
48 ساعت و 20 دقیقه
873
3
ترکیش ایرلاینز
تهران (IKA)
07:55
استان یاماگاتا (GAJ)
18:40
292,377,000
58 ساعت و 45 دقیقه
873
4
ترکیش ایرلاینز
تهران (IKA)
07:35
استان یاماگاتا (GAJ)
08:15
292,377,000
48 ساعت و 40 دقیقه
879
5
ترکیش ایرلاینز
تهران (IKA)
07:35
استان یاماگاتا (GAJ)
18:40
292,377,000
59 ساعت و 5 دقیقه
879
6
ترکیش ایرلاینز
تهران (IKA)
03:25
استان یاماگاتا (GAJ)
08:15
321,767,000
52 ساعت و 50 دقیقه
881
7
ترکیش ایرلاینز
تهران (IKA)
03:25
استان یاماگاتا (GAJ)
18:40
321,767,000
63 ساعت و 15 دقیقه
881
8
قطر ایرویز
تهران (IKA)
22:35
استان یاماگاتا (GAJ)
08:15
331,299,000
33 ساعت و 40 دقیقه
499
9
قطر ایرویز
تهران (IKA)
22:35
استان یاماگاتا (GAJ)
18:40
331,299,000
44 ساعت و 5 دقیقه
499
10
قطر ایرویز
تهران (IKA)
15:10
استان یاماگاتا (GAJ)
18:40
331,299,000
51 ساعت و 30 دقیقه
483
11
قطر ایرویز
تهران (IKA)
15:10
استان یاماگاتا (GAJ)
08:15
331,299,000
41 ساعت و 5 دقیقه
483
12
قطر ایرویز
تهران (IKA)
04:40
استان یاماگاتا (GAJ)
08:15
332,175,000
51 ساعت و 35 دقیقه
491
13
قطر ایرویز
تهران (IKA)
04:40
استان یاماگاتا (GAJ)
18:40
332,175,000
62 ساعت
491
14
ترکیش ایرلاینز
تهران (IKA)
02:20
استان یاماگاتا (GAJ)
08:15
742,307,000
53 ساعت و 55 دقیقه
875
15
ترکیش ایرلاینز
تهران (IKA)
02:20
استان یاماگاتا (GAJ)
18:40
742,307,000
64 ساعت و 20 دقیقه
875
16
لوفتهانزا
تهران (IKA)
01:45
استان یاماگاتا (GAJ)
18:40
1,544,576,000
40 ساعت و 55 دقیقه
601
17
لوفتهانزا
تهران (IKA)
01:45
استان یاماگاتا (GAJ)
08:15
1,544,576,000
54 ساعت و 30 دقیقه
601
18
لوفتهانزا
تهران (IKA)
01:45
استان یاماگاتا (GAJ)
08:15
1,580,588,000
54 ساعت و 30 دقیقه
601
19
لوفتهانزا
تهران (IKA)
01:45
استان یاماگاتا (GAJ)
18:40
1,580,588,000
40 ساعت و 55 دقیقه
601
Cleaning & Joining the tables¶
'''Enter your code here'''
alibaba = df.copy()
mrblit = data.copy()
alibaba = alibaba.rename(columns={'Flight_num':'Flight_number'})
alibaba['Departure_airport'] = alibaba['Departure'].apply(lambda s: s.split('\n')[1])
alibaba['Departure_time'] = alibaba['Departure'].apply(lambda s: s.split('\n')[0])
del alibaba['Departure']
alibaba['Arrival_airport'] = alibaba['Arrival'].apply(lambda s: s.split('\n')[1])
alibaba['Arrival_time'] = alibaba['Arrival'].apply(lambda s: s.split('\n')[0])
del alibaba['Arrival']
non_duplicate_columns = list(set(alibaba.columns) - {'Airline', 'Arrival_airport', 'Departure_airport', 'Duration'})
merged = pd.merge(mrblit.drop_duplicates(), alibaba[non_duplicate_columns], how='left', on=['Flight_number', 'Departure_time', 'Arrival_time'],
suffixes=('_MrBilit', '_Alibaba'))
print(merged.info())
merged.head(5)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 20 entries, 0 to 19
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Airline 20 non-null object
1 Departure_airport 20 non-null object
2 Departure_time 20 non-null object
3 Arrival_airport 20 non-null object
4 Arrival_time 20 non-null object
5 Price_MrBilit 20 non-null object
6 Duration 20 non-null object
7 Flight_number 20 non-null int64
8 Price_Alibaba 9 non-null object
dtypes: int64(1), object(8)
memory usage: 1.6+ KB
None
Airline
Departure_airport
Departure_time
Arrival_airport
Arrival_time
Price_MrBilit
Duration
Flight_number
Price_Alibaba
0
ترکیش ایرلاینز
تهران (IKA)
03:10
استان یاماگاتا (GAJ)
08:15
211,543,000
53 ساعت و 5 دقیقه
7689
207,345,475
1
ترکیش ایرلاینز
تهران (IKA)
03:10
استان یاماگاتا (GAJ)
18:40
211,543,000
63 ساعت و 30 دقیقه
7689
NaN
2
ترکیش ایرلاینز
تهران (IKA)
07:55
استان یاماگاتا (GAJ)
08:15
292,377,000
48 ساعت و 20 دقیقه
873
287,290,395
3
ترکیش ایرلاینز
تهران (IKA)
07:55
استان یاماگاتا (GAJ)
18:40
292,377,000
58 ساعت و 45 دقیقه
873
NaN
4
ترکیش ایرلاینز
تهران (IKA)
07:35
استان یاماگاتا (GAJ)
08:15
292,377,000
48 ساعت و 40 دقیقه
879
287,290,395
Yes, For these flights Mrbilit is always cheaper¶
import matplotlib.pyplot as plt
plt.plot(list(merged.dropna()['Price_MrBilit']), label='MrBilit')
plt.plot(list(merged.dropna()['Price_Alibaba']), label='Alibaba')
plt.legend()
<matplotlib.legend.Legend at 0x7fa6982ced10>
import matplotlib.pyplot as plt
%matplotlib inline
mrblit['hour'] = mrblit['Departure_time'].apply(lambda s: s.split(':')[0])
mrblit['Price'] = mrblit['Price'].apply(lambda s: int(''.join(s.split(','))))
mrblit.groupby('hour')['Price'].mean()
alibaba['hour'] = alibaba['Departure_time'].apply(lambda s: s.split(':')[0])
alibaba['Price'] = alibaba['Price'].apply(lambda s: int(''.join(s.split(','))))
alibaba.groupby('hour')['Price'].mean()
hour
01 1.767986e+09
03 2.618277e+08
04 3.439714e+08
07 2.872904e+08
15 3.431284e+08
22 3.431284e+08
Name: Price, dtype: float64
mrbilit_hours = list(mrblit.groupby('hour')['Price'].mean().index)
mrblit_mean_prices = list(mrblit.groupby('hour')['Price'].mean())
alibaba_hours = list(alibaba.groupby('hour')['Price'].mean().index)
alibaba_mean_prices = list(alibaba.groupby('hour')['Price'].mean())
plt.bar(mrbilit_hours, mrblit_mean_prices)
plt.title('Mrbilit')
plt.figure()
plt.title('Alibaba')
plt.bar(alibaba_hours, alibaba_mean_prices)
<BarContainer object of 6 artists>